This book will focus on Advanced Excel techniques to solve real life problems. It is also a reference to the understand of Array Formulas. With Array Formulas you can solve problems that look impossible without the use of VBA. It will be of great value to those with intermediate knowledge of Excel, wanting to learn more advanced features and master Array Formulas.
CONTENTS About the Author ................................................................................................................................ 2-6 1
Introduction.................................................................................................................................. 2-6 1.1
What is an Array? ................................................................................................................. 2-6
1.2
What is an Array Formula? ................................................................................................... 2-6
1.3
Pros and Cons of Array Formulas ......................................................................................... 2-7
1.3.1
Pros ................................................................................................................................ 2-7
1.3.2
Cons ............................................................................................................................... 2-7
1.4
Do we always need to <Ctrl+Shift+Enter>? .......................................................................... 2-8
1.5
Using conditions ................................................................................................................... 2-8
1.5.1
Replacing AND by multiplication (*) and OR by addition (+)........................................... 2-8
1.5.2
Arrays of conditions ........................................................................................................ 2-9
1.6
Time considerations when converting TRUE and FALSE to numeric ................................. 2-10
1.6.1 2
Golden rules .............................................................................................................................. 2-11 2.1
Golden rule #1 – Step by step ............................................................................................ 2-11
2.1.1 2.2 2.3
Names starting and ending with the same letter........................................................... 2-14
Golden rule #3 – If possible check for a non-array solution ................................................ 2-15
2.3.1
Names starting with “R” or ending with “S” ................................................................... 2-16
Conditional aggregation............................................................................................................. 3-19 3.1
CountIf, ContIfs, SumIf, SumIfs, AverageIf, AverageIfs ...................................................... 3-19
3.1.1 3.2
4
First word and last word ............................................................................................... 2-11
Golden rule #2 – Create the array first ................................................................................ 2-14
2.2.1
3
How many rows with equal values in column B and C ................................................. 2-10
Summing the values greater than the average ............................................................. 3-19
Array formulas, solutions for conditional aggregation ......................................................... 3-19
3.2.1
Summing the even values ............................................................................................ 3-19
3.2.2
Standard deviating the even values ............................................................................. 3-22
3.2.3
The Aggregate Function ............................................................................................... 3-23
3.2.4
Smallest even value ..................................................................................................... 3-24
Auxiliary Techniques ................................................................................................................. 4-27 4.1
Position of the last non-empty cell in a range ..................................................................... 4-27
4.1.1
No empty cells between data. ...................................................................................... 4-27
4.1.2
Empty cells between data ............................................................................................ 4-27
4.2
Position of the first empty cell ............................................................................................. 4-31
2-1
4.3
Returning row numbers....................................................................................................... 4-32
4.4
Returning row numbers in reverse order............................................................................. 4-33
4.4.1 4.5
5
Array Constants .................................................................................................................. 4-33
4.5.1
Array constant containing digits 0 to 9.......................................................................... 4-33
4.5.2
Array constant containing letters “A” to “Z” ................................................................... 4-34
4.6
Naming a formula................................................................................................................ 4-35
4.7
Naming First Word and Last Word formulas ....................................................................... 4-37
The most frequent value ............................................................................................................ 5-38 5.1
Range with numeric values ................................................................................................. 5-38
5.1.1
One Mode .................................................................................................................... 5-38
5.1.2
More than one Mode .................................................................................................... 5-38
5.2
6
List of values in reverse order ...................................................................................... 4-33
Range with mixed values (not only numeric) ...................................................................... 5-39
5.2.1
One mode .................................................................................................................... 5-39
5.2.2
More than one Mode .................................................................................................... 5-40
Text Stuff ................................................................................................................................... 6-42 6.1
Most frequent character ...................................................................................................... 6-42
6.1.1
Most frequent character, case sensitive ....................................................................... 6-42
6.1.2
Most frequent character, case sensitive, ignoring spaces ............................................ 6-44
6.1.3
Most frequent character, not case sensitive, ignoring spaces ...................................... 6-44
6.1.4
Most frequent “A” to “Z” letter, not case sensitive ......................................................... 6-45
6.2
Number of distinct characters ............................................................................................. 6-49
6.2.1
Number of distinct characters, space included ............................................................. 6-49
6.2.2
Number of distinct characters, space ignored .............................................................. 6-51
6.2.3
Number of distinct “A” to “Z” letters, case sensitive ...................................................... 6-52
6.2.4
Number of distinct “A” to “Z” letters, not case sensitive ................................................ 6-52
6.3
Obtaining the nth word from text .......................................................................................... 6-52
6.3.1
Formula solution ........................................................................................................... 6-52
6.3.2
VBA Solution ................................................................................................................ 6-53
6.4
Text Split ............................................................................................................................. 6-55
6.5
Text Join (with VBA alternative solution)............................................................................. 6-55
6.5.1
Inserting spaces between each letter ........................................................................... 6-56
6.5.2
Removing irregular spacing .......................................................................................... 6-57
6.6
Number of words in a text column ...................................................................................... 6-57
2-2
7
6.7
Biggest string in a text column ............................................................................................ 6-58
6.8
Sum of squares of individual digits in a number.................................................................. 6-59
6.9
Credit Card Validation ......................................................................................................... 6-60
Random values without repetitions ............................................................................................ 7-63 7.1
7.1.1
Using an auxiliary range and the Rank function ........................................................... 7-63
7.1.2
No auxiliary range ........................................................................................................ 7-65
7.1.3
VBA versatile solution .................................................................................................. 7-67
7.2
8
Distinct integer random numbers between two values ........................................................ 7-69
7.2.1
Between 1 and N .......................................................................................................... 7-69
7.2.2
Between K and N ......................................................................................................... 7-69
7.2.3
VBA solution ................................................................................................................. 7-70
Put things in order with formulas ............................................................................................... 8-73 8.1
Ordering numeric data ........................................................................................................ 8-73
8.1.1
Ascending order ........................................................................................................... 8-73
8.1.2
Descending order ......................................................................................................... 8-74
8.2
Ordering text data ............................................................................................................... 8-75
8.2.1
Using an auxiliary range ............................................................................................... 8-75
8.2.2
No auxiliary range ........................................................................................................ 8-77
8.3
Top N values and corresponding value in another column ................................................. 8-78
8.3.1
Manual, no formulas solution ....................................................................................... 8-78
8.3.2
Non-array formula solution ........................................................................................... 8-82
8.3.3
Using a Table and Structured References ................................................................... 8-84
8.3.4
Array formula solution .................................................................................................. 8-85
8.3.5
Using Tables and Queries ............................................................................................ 8-86
8.4
9
Extracted from a range ....................................................................................................... 7-63
Check if a column range is ordered .................................................................................... 8-92
8.4.1
Ascending order ........................................................................................................... 8-92
8.4.2
Ascending, Descending, or not ordered. ...................................................................... 8-94
Overcoming Lookup and Reference functions limitations .......................................................... 9-95 9.1
Only one Lookup value limitation ........................................................................................ 9-95
9.2
First Match limitation ........................................................................................................... 9-98
9.2.1
Non-array solution with an auxiliary range ................................................................... 9-99
9.2.2
Non-array solution with no auxiliary range ................................................................. 9-101
9.2.3
Array solution with no auxiliary range ......................................................................... 9-102
2-3
9.2.4 10
Distinct values .................................................................................................................... 10-105
10.1
Number of distinct numeric values ............................................................................... 10-105
10.1.1
Frequency solution ................................................................................................ 10-105
10.1.2
Countif Solution ..................................................................................................... 10-107
10.1.3
Match Solution ...................................................................................................... 10-109
10.1.4
Using a Table and Structured References ............................................................ 10-110
10.1.5
Timing the solutions .............................................................................................. 10-111
10.2
Number of distinct text values...................................................................................... 10-111
10.2.1
Frequency/Match solution ..................................................................................... 10-111
10.2.2
Countif Solution ..................................................................................................... 10-113
10.2.3
Match Solution ...................................................................................................... 10-113
10.2.4
Timing solutions .................................................................................................... 10-114
10.3
11
Converting to a table and using Structured References ............................................. 9-104
Obtaining the distinct values ........................................................................................ 10-115
10.3.1
Manual method – Using Remove Duplicates ........................................................ 10-115
10.3.2
Manual method – Using Advanced Filter .............................................................. 10-117
10.3.3
Match with Aggregate formula solution (standard references) .............................. 10-118
10.3.4
Match with Aggregate formula solution (structured references) ............................ 10-120
10.3.5
Using a Pivot Table ............................................................................................... 10-120
Conditional Formatting ....................................................................................................... 11-125
11.1
Credit Card Validation.................................................................................................. 11-125
11.2
Changing Active Row Background .............................................................................. 11-126
11.3
Student names in bold for the best Score in each Class ............................................. 11-130
12
Data Validation ................................................................................................................... 12-132
12.1
Credit Card Validation.................................................................................................. 12-132
12.2
Making sure we don’t enter repeated order numbers .................................................. 12-134
12.3
Creating Drop Down Lists ............................................................................................ 12-135
13
12.3.1
A list of Countries – not adaptable ........................................................................ 12-135
12.3.2
A list of Countries – adaptable (using a formula) .................................................. 12-136
12.3.3
A list of Countries – adaptable (using Tables)....................................................... 12-137
12.3.4
City Drop Down content depending on selected country in Country Drop Down .. 12-138
Dates and Array Formulas ................................................................................................. 13-140
13.1
Array formula to obtain all dates between two given dates .......................................... 13-140
13.2
How many Fridays the 13th between two given dates ................................................. 13-141
2-4
13.3
How many months with 5 complete weekends between two given dates? .................. 13-143
13.4
How many months with 5 complete Friday, Saturday, Sunday sequences ................. 13-147
14
A Model for Multiple Choice Tests..................................................................................... 14-148
14.1
With no penalty for wrong answers .............................................................................. 14-148
14.2
With penalty for wrong answers ................................................................................... 14-151
14.2.1 15
A Shorter more efficient solution ........................................................................... 14-153
Excel 365 ........................................................................................................................... 15-154
15.1
Filter Function .............................................................................................................. 15-154
15.2
Unique Function .......................................................................................................... 15-154
15.3
Sort Function ............................................................................................................... 15-155
2-5
ABOUT THE AUTHOR Carlos Rondão teaches Advanced Excel/VBA in Master Programs and Computer Science and Technology Information Systems in the Undergraduate Degree in Management and Economics. He also works as a consultant in those areas. His areas of interest focus on Excel and advanced Excel programming using VBA. He is the author of many Excel VBA games such Bejeweled, BubbleCell, CellTetris, Berulex, etc. You can contact me here
1 INTRODUCTION This book will focus on Advanced Excel techniques to solve real life problems. It is also a reference to the understand of Array Formulas. With Array Formulas you can solve problems that look impossible without using VBA. It will be of great value to those with intermediate knowledge of Excel, wanting to learn more advanced features and master Array Formulas. Depending on your regional settings, you will need to replace, in this book examples, the “;” argument delimiter by “,”
1.1 WHAT IS AN ARRAY? An array is a set of 2 or more elements.
1.2 WHAT IS AN ARRAY FORMULA? When a formula returns an array, we say it is an Array Formula. Excel functions like Frequency, Transpose, Mmult, Mdeterm are natural Array Formulas, but, almost any function, expecting a single value as argument, but used with an array, are Array Formulas. Figure 1-1 shows the use of the Transpose function. We want to transpose the B1:C3 matrix, as the original matrix is 3 rows by 2 columns, Transpose will return 2 rows by 3 columns matrix.
Figure 1-1
2-6
To enter the formula: 1 – select the destination range (E1:G2) 2 – write the formula (making sure all the destination is selected) 3 – Instead of just <Enter> press <Ctrl+Shift+Enter> In the next example, we are going to use IF with an array argument. The idea is to return, for the A1:B3 range, “YES” if the value in column A is equal to the column B value, “NO” otherwise. As we have 3 rows, the formula should return 3 values.
Figure 1-2
IF, naturally expects the first argument, to be just one condition, but we feed it with an array containing 3 conditions. The return array should contain 3 values, if A1=B1 »“YES” otherwise “NO”, if A2=B2 »“YES” otherwise “NO” and if A3=B3 »“YES” otherwise “NO”. To enter the formula: 1 – select the destination range (D1:D3) 2 – write the formula (making sure all the destination is selected) 3 – Instead of just <Enter> press <Ctrl+Shift+Enter> At this point you are probably thinking: why should I do that? I could write in D1 the formula =IF(A1=B1;”YES”;”NO”) and then copy down till D3. You are right, you will obtain the same result. For now, just think it as another way to do it, remember, however, that one method uses just one formula and the other uses 3 different formulas.
1.3 PROS AND CONS OF ARRAY FORMULAS 1.3.1 PROS -
You can solve problems that look impossible to solve without using VBA You can create much more structured models, you do not need to use so much auxiliary ranges
1.3.2 CONS -
Most of the time they are slower than other equivalent methods (this is especially true for big arrays) For some solutions, you need to <Ctrl+Shift+Enter>, if you forget that, the formula will return an error or a wrong value They are harder to read and understand by the majority of users
Note: if there is nothing to gain, and you can live with using other methods, do not use Array Formulas
2-7
1.4 DO WE ALWAYS NEED TO <CTRL+SHIFT+ENTER>? No, many solutions can avoid the need to <Ctrl+Shift+Enter> As loose rules we can say: -
If you are using Excel 365 you do not need to <Ctrl+Shift+Enter> Functions expecting an array as argument do not need <Ctrl+Shift+Enter>, that’s the case for Sumproduct, Mode, some Aggregate variations, etc… Functions expecting a single value as argument but used with an array, need <Ctrl+Shift+Enter> Functions expecting only a range as argument but used with an array, need <Ctrl+Shift+Enter> Functions using an array constant as argument do not need <Ctrl+Shift+Enter> The formula, =SUM(MOD(A1:A10;2)) (to count how many odd values in A1:A10) needs <Ctrl+Shift+Enter> The formula, =SUM(MOD({2;4;5;2;3;4;6;4;8;9};2)) (to count how many odd values in an array constant) does not need <Ctrl+Shift+Enter>
1.5 USING CONDITIONS When working with Array formulas it is, sometimes, useful to treat the result of a condition as a numeric integer value. A condition naturally returns TRUE or FALSE, but, as you can imagine, in the end, all kind of information is stored in memory as an Integer value: 0 --» represents the Boolean value FALSE 1 --» represents the Boolean value TRUE (TRUE is saved in memory as integer 1, but any integer value greater than 0 will be treated as TRUE, see Figure 1-4, cell A7) If we use mathematical operations with the result of conditions, Excel will automatically convert FALSE to 0 and TRUE to 1. Figure 1-4 illustrates this.
Figure 1-3
1.5.1 REPLACING AND BY MULTIPLICATION (*) AND OR BY ADDITION (+) Based on the above its easy to figure out that, for single value conditions, you can replace AND Boolean function by multiplication (*) and OR by addition (+). The AND function only returns TRUE if all the arguments are TRUE. If you multiply the conditions, as TRUE is replaced by 1 and FALSE by 0, the only way to obtain 1 is if all conditions are 1 (TRUE). The formula =AND(5>3;2<4) can be replaced by =(5>3)*(2<4) The OR function only returns FALSE if all arguments are FALSE. If you add the conditions, as TRUE is replaced by 1 and FALSE by 0, the only way to obtain 0 is if all conditions are 0 (FALSE).
2-8
However, it is possible to obtain a value greater than 1 if more than one condition is TRUE. Remember that any value greater than 0 will be treated as TRUE, but, in some cases, you need to take this into account. The formula = OR(5>3;2<4) can be replaced by =(5>3)+(2<4) (returning, in this case the value 2 --ÂťTRUE) Figure 1-4 illustrates this.
Figure 1-4
1.5.2 ARRAYS OF CONDITIONS What happens if, for instance, we have =AND(A1:A3=B1:B3;B1:B3=C1:C3) instead of =AND(A1=B1;B1=C1) ? This means the AND arguments are now arrays, not single conditions. Can we still use =(A1:A3=B1:B3)*(B1:B3=C1:C3) ? Yes, but now the two formulas will return two different things, and thatâ&#x20AC;&#x2122;s why the second is so important. In this case AND is still an Array formula and needs <Ctrl+Shift+Enter>, but returns only one value, the same as =AND(AND(A1=B1;B1=C1); AND(A2=B2;B2=C2); AND(A3=B3;B3=C3)) Multiplication, in this case, returns 3 different values, the first corresponds to (A1=B1)*(B1=C1), the second to (A2=B2)*(B2=C2) and the third to (A3=B3)*(B3=C3). Figure 6 illustrates this. (remember both formulas need <Ctrl+Shift+Enter>, if you forget to do it, AND will return AND(A1=B1;B1=C1))
Figure 1-5
If we sum the E4:E6 values (formula in G3) we can obtain the number of rows in A1:C3 with equal values. We can even do it without the need of E4:E6 range, just use the E4:E6 formula directly as argument of Sumproduct, this way you reduce the number of auxiliary ranges. We replace Sum by Sumproduct because, as you can see in Fiigure 3, Sumproduct does not need <Ctrl+Shift+Enter>, and with just an argument, they return the same.
2-9
1.6 TIME CONSIDERATIONS WHEN CONVERTING TRUE AND FALSE TO NUMERIC It is difficult to accurate measure a formula execution time. Many things can interfere: Processor interrupts to do other tasks, modern processors with turbo speeds, timing accuracy, etc. This is particularly true when trying to measure very small times. In cases like that it is best to repeat the process many times and then measure. We have done that in our Time It workbook, and you can use it for your own timings. It should only be used to compare different alternatives, don’t use it to accurately measure just one formula. To use Time It, select the formula or formulas you want to time, and then press <Time It> button. Time It also includes, in a module, the code for some of the VBA Functions used in this book. This symbol means you have an Excel file to download, use this one to download TimeIt
1.6.1 HOW MANY ROWS WITH EQUAL VALUES IN COLUMN B AND C We want to know, in range B1:C100, how many rows with equal values in column B and C. The formula B1:B100=C1=C100 will return an array of TRUE and FALSE values; B1=C1, B2=C2, B3=C3…B100=C100. In order to know how many are TRUE, we need to transform TRUE to 1 and FALSE to 0, and then sum. In order to avoid <Ctrl+Shift+Enter> we’re going to use Sumproduct instead of Sum. We’re going to use (1*), (--), (0+) and (/1)
In Figure 1-7 we can see the time differences, for the 4 methods, measured by the Time It Workbook. To test it yourself, you should write the 4 formulas in 4 different cells, and for each one, select the formula and press <Time It> button.
Figure 1-7 Eval/Sec – number of formula evaluations per second (bigger number means faster)
As you can see there is no big difference between times, and if you time it again, some of the relative positions will change. But the double negative (--) is consistently faster, and from now on, will be the one we use.
2-10
2 GOLDEN RULES 2.1 GOLDEN RULE #1 – STEP BY STEP For any big formula, array or not, always write it step by step. Try the formula for each step, advance to the next step only when you are sure that part is correct. Never try to write it all at once, if you have an error will be much more difficult to figure it out.
2.1.1 FIRST WORD AND LAST WORD
Figure 2-1 We want a formula in C2 to obtain the first word for B2 name and another in D2 to obtain the last word. The formula will be copied down for all the other (B3:B11) names.
First word Step 1 – Obtain the position for the first B2 space. (Find function) =FIND(" ";B2)
Figure 2-2 When copying the formula down we see an error in C9, Find returns an error because there is no space in B9. We will solve this later.
Step 2 – Obtain the text left of that position (Left function) =LEFT(B2;FIND(" ";B2)-1)
Figure 2-3 We use the Left function to obtain the text left from the space position, -1 because we don’t want to include the space.
2-11
Step 3 – Solve the error in C9 (IsError function) =IFERROR(LEFT(B2;FIND(" ";B2)-1);B2)
Figure 2-4 Copy the formula down and you see that the C9 error disappeared. The IfError function was used to solve that. If the first argument returns an error, IfError will return the second argument. In this case will be the B9 name.
Last word This is a very good example to show the Step by Step Method. We will end up with a big formula, but, it will be easy to write when done step by step. In this case, we need to find out the position of the last space. It will be more difficult because the Find function always returns the position of the first occurrence. The general idea is to replace the last space by something that will be unique in the text and then apply the previous First Word method, using, in the end, Right instead of Left. Step 1 – Find out the number of spaces (the method only works if we have just a space between words) Step 1-1 Remove all spaces (Substitute function) =SUBSTITUTE(B2;" ";"")
Figure 2-5 We use Substitute to replace all spaces by “Nothing” (empty string “”)
2-12
Step 1-2 Subtract the original length from the length after spaces removed. (Len function) =LEN(B2)-LEN(SUBSTITUTE(B2;" ";""))
Figure 2-6 Subtracting the lengths will return the total number of spaces.
Step 2 – Replace the last space by something that will be unique in a name, for instance “#” (Substitute function again, now using the last optional argument. When used means that we want to replace only that specific occurrence. For instance, if the previous formula returns 2, we want to replace only the second space. Remember the formula returns the number of spaces returning 2 means the second space is the last space). =SUBSTITUTE(B2;" ";"#";LEN(B2)-LEN(SUBSTITUTE(B2;" ";"")))
Figure 2-7 You can see that the last space was replaced by “#”. As there is no space in B9, C9 formula returns error.
2-13
Step 3 – Apply the First Name method (you can now use Find to obtain the “#” position, as we only have one “#” this will be the position of the last space) =IFERROR(RIGHT(B2;LEN(B2)-FIND("#";SUBSTITUTE(B2;" ";"#";LEN(B2)LEN(SUBSTITUTE(B2;" ";"")))));B2)
Figure 2-8 The final formula. Left was replaced by Right. We want the text after the last space position (P). The last argument of Right should be Len(B2) – P P – Position of last space, given by Find. In the end IfError was used again to solve the error in C9
2.2 GOLDEN RULE #2 – CREATE THE ARRAY FIRST If a formula needs an expression returning an array, create first the expression (using an auxiliary range) and make sure that the resulting array contains the values you expect. Later use the expression (copy/paste) in your end formula, now you can delete the auxiliary range.
2.2.1 NAMES STARTING AND ENDING WITH THE SAME LETTER Using Figure 1-1 data we want to know how many names start and end with the same letter. Step 1 – Create an array that will tell us, for each name, if it starts and ends with the same letter. (Left and Right functions) Select a range the size of your destination data, for instance D2:D11 Write the formula =LEFT(B2:B11)=RIGHT(B2:B11) and then <Ctrl+Shift+Enter>
Figure 2-9 The formula returns an array containing TRUE and FALSE. TRUE if the first and last letter are equal, FALSE otherwise.
Note that Excel is not case sensitive when comparing text
2-14
Step 2 – Apply double negative (--) to transform TRUE to 1 and FALSE to 0 =--(LEFT(B2:B11)=RIGHT(B2:B11)) <Ctrl+Shift+Enter>
Figure 2-10 The formula now returns an array containing zeros and ones. 1 if the first and last letter are equal, 0 otherwise. The sum of these zeros and ones is the solution to the problem
Step 3 – Copy the text from last formula, paste it as argument for the Sumproduct function, after this you can delete the D2:D11 auxiliary range. With Sumproduct we don’t need to <Ctrl+Shift+Enter>, that’s why we used it instead of Sum. The final formula (in C2) =SUMPRODUCT(--(LEFT(B2:B11)=RIGHT(B2:B11))) <Enter>
Figure 2-11 Just one formula to solve the problem. The D2:D11 range was useful to make sure that our array contains the correct data, now can be deleted. We just use the formula, that returns the array, as argument of Sumproduct.
Another way to see the array, used by Sumproduct, is to select the part that returns the array and press the F9 Key. Figure 2-12 shows just that. This solution only helps if we have a relatively small array.
Figure 2-12 Select --(LEFT(B2:B11)=RIGHT(B2:B11)) Press F9 key we can now see the expanded array returned by the selected formula.
2.3 GOLDEN RULE #3 – IF POSSIBLE CHECK FOR A NON-ARRAY SOLUTION If you can use a non-array, direct formula, use it, don’t use an array formula solution. It´s normal when one starts mastering array formulas, to use them for almost every solution. Remember however that, especially for big arrays, array formulas can be very slow, a non-array solution will probably be faster.
2-15
2.3.1 NAMES STARTING WITH “R” OR ENDING WITH “S” We want to calculate the number of students with names starting with “R” or ending with “S”. The use of WildCard characters (* and ?) in text criteria - examples for CountIf Starts with “A” =COUNTIF(B2:B11;”A*”) Ends with “A” =COUNTIF(B2:B11;”*A”) Contains an “A” =COUNTIF(B2:B11;”*A*”) Contains, at least, 2 “A” =COUNTIF(B2:B11;”*A*A*”) Third letter is an “A” =COUNTIF(B2:B11;”??A*”) Text with, exactly, 3 characters =COUNTIF(B2:B11;”???”) WildCard Characters (* and ?) can also be used in the first argument of Match and Vlookup CountIf (non-array solution) Remember CountIfs, SumIfs and AverageIfs can be used with more than one criteria, but only if the criteria are to be applied at the same time (AND), in this case we have an (OR) situation. But we can still solve it with 3 CountIf functions. (number of names starting with “R”)+(number of names ending with “S”)-(number of names starting with “R” and ending with “S”)
For B2:B11 data we wrote, in C2 cell, the formula (note that text criteria is not case sensitive): =COUNTIF(B2:B11;"R*")+COUNTIF(B2:B11;"*S")-COUNTIF(B2:B11;"R*S")
Figure 2-13 CountIf solution
Array formula solution Step 1 – Create an array that will tell us, for each name, if it starts with “R” or ends with “S” (Left and Right functions) Select a range the size of your destination data, for instance D2:D11 Write the formula =(LEFT(B2:B11)="R")+ (RIGHT(B2:B11)="S") and then <Ctrl+Shift+Enter> Note that Excel is not case sensitive when comparing text
Figure 2-14 The resulting array contains a value >0 if the name starts with “R” or ends with “S”. Note that one of the values is 2, we have a name that starts with “R” and ends with “S”. This situation must be solved, we only want 0 and 1, in order to sum the array and obtain the correct value.
2-16
Step 2 – Check if the value is greater than 0 (returns an array of (TRUE and FALSE). =((LEFT(B2:B11)="R")+ (RIGHT(B2:B11)="S"))>0 <Ctrl+Shift+Enter>
Figure 2-15 The resulting array contains only TRUE and FALSE values. TRUE if starts with “R” or ends with “S”, FALSE otherwise.
Step 3 – Transform TRUE and FALSE back to 0 and 1 (using double negative --) =--(((LEFT(B2:B11)="R")+ (RIGHT(B2:B11)="S"))>0) <Ctrl+Shift+Enter>
Figure 2-16 Transforming back to 0 and 1. Now the sum of all array elements will answer the question.
Step 4 – Copy the formula text and paste it as argument of Sumproduct. (formula in C2) =SUMPRODUCT(--(((LEFT(B2:B11)="R")+(RIGHT(B2:B11)="S"))>0)) <Enter>
Figure 2-17 The final formula, you can now delete the D2:D11 auxiliary range.
2-17
Timing both solutions We use Time It worbook to time both array and non-array solutions for this data. Figure 2-18 and 2-19 show the results.
Figure 2-18 For a small range (B2:B11) the time difference is, in this case, negligible.
Letâ&#x20AC;&#x2122;s time it again with a big range (B2:B5000)
Figure 2-19 With a bigger range (B2:B5000) the non-array solution is significantly faster, about 40%
2-18
3 CONDITIONAL AGGREGATION 3.1 COUNTIF, CONTIFS, SUMIF, SUMIFS, AVERAGEIF, AVERAGEIFS Those are the functions to use if we conditionally want to Count, Sum or Average. Use CountIf, SumIf or AverageIf if there is only one criteria. Use the Ifs counterparts to apply two or more criteria simultaneously (AND) Remember the criteria argument for these functions is text, the exception is equality. For instance, to count, in A1:A10 range, the number of values equal to 10, we can write: =COUNTIF(A1:A10;”=10”) Or =COUNTIF(A1:A10;10) However, if we use any one of the other relational operators (>, >=, <, <=, <>), we must write the criteria as text.
3.1.1 SUMMING THE VALUES GREATER THAN THE AVERAGE In this case, the criteria must reflect values greater than the average, the criteria must include the Average function. We cannot include the function inside double quotes. If we do it, the function will not be evaluated. This is an incorrect criteria: “>AVERAGE(A1:A10)” and this is the correct one: “>” & AVERAGE(A1:A10) The end formula is: =SUMIF(A1:A10;">" & AVERAGE(A1:A10))
Figure 3-1 The Average function is evaluated first, the criteria is the concatenation of “>” with the Average result.
3.2 ARRAY FORMULAS, SOLUTIONS FOR CONDITIONAL AGGREGATION 3.2.1 SUMMING THE EVEN VALUES The criteria (to be even) cannot be used with SumIf, so, if we don’t want auxiliary ranges, we need to create a formula array. To check if a number is even we can use the IsEven function, but, unfortunately, this function will not accept an array as argument, we’re going to use the MOD function instead. MOD returns the remainder after integer division, if we divide by 2, MOD returns 0 for even numbers, 1 for odd numbers.
3-19
Step 1 â&#x20AC;&#x201C; Array with TRUE and FALSE (TRUE for even values, False for odd values) Select a range the size of your destination data, for instance E2:E11 Write the formula =MOD(A1:A10;2)=0 and then <Ctrl+Shift+Enter>
Figure 3-2 Array containing TRUE for even values, FALSE for odd values
Step 2 â&#x20AC;&#x201C; Apply (--) to transform TRUE to 1 and FALSE to 0 =--(MOD(A1:A10;2)=0) <Ctrl+Shift+Enter>
Figure 3-3 0 odd 1 even
Step 3 - Replace 1 by corresponding A1:A10 values, the values we want to sum. We can obtain that by just multiplying A1:A10 by the last array. Now you can remove (--) because we already have a mathematical operation. =(A1:A10)*(MOD(A1:A10;2)=0) <Ctrl+Shift+Enter>
Figure 3-4 Now we have an array with only the even values, odd values remain as 0. As we need, in the end, to sum, zeros will be ignored.
3-20
Step 4 â&#x20AC;&#x201C; Copy the formula text and paste it as argument of Sumproduct. (formula in C2) =SUMPRODUCT((A1:A10)*(MOD(A1:A10;2)=0)) <Enter>
Figure 3-5 The end formula, as we use Sumproduct, does not need <Ctrl+Shift+Enter> The E1:E10 range can now be deleted.
What happens if we have empty cells in A1:A10? MOD will use <empty> as numeric 0, and in the end, treated as even. The Figure 29 formula will now return:
Figure 3-6 In E7 we have a false even value.
But, when we multiply by A1:A10 values, the false even will translate again to zero, the final formula will still work.
Figure 3-7 After the multiplication by (A1:A10) the false even is now treated as an odd value.
3-21
The end result, even with empty cells, is still correct: =SUMPRODUCT((A1:A10)*(MOD(A1:A10;2)=0)) <Enter>
Figure 3-8 The formula will work even with empty A1:A10 values
3.2.2 STANDARD DEVIATING THE EVEN VALUES It seems very similar to the previous problem, obtain the array given by =(A1:A10)*(MOD(A1:A10;2)=0) and then, instead of Sumproduct use Stdev, right? See that the resulting array contains zeros for the odd values, they are ignored if you want to sum, but not if you want to use Stdev. So, we must replace zeros by something ignored by these kind of functions, like text or even TRUE/FALSE values. Step 1 – Repeat, previous problem, step 1, till we obtain Figure 3-2 results. Step 2 - Now, instead of multiplying by A1:A10, we use an if to write the A1:A10 value if even and “” (empty string) if odd. As any value>0 will be treated as TRUE we will write it like this: =IF(MOD(A1:A10;2);"";A1:A10), avoiding, this way, the need to test the result of MOD.
Figure 3-9 If even A1:A10 value, odd empty space. Note that we still have the empty cell problem. A7 empty cell is treated as 0.
Step 3 – Solving the empty cell problem. The If first argument is checking for odd, returning, in that case an <empty string>. In order to ignore empty cells the formula should also return an <empty string> if the cell is empty. We can add an OR to the condition to do that (remember that OR can be replaced by +) =IF((A1:A10="")+MOD(A1:A10;2);"";A1:A10)
<Ctrl+Shift+Enter>
Figure 3-10 The array contains now only the even values, ignoring empty cells.
3-22
Step 4 – Copy the formula text and paste it as argument of StDev. In this case we need <Ctrl+Shift+Enter> =STDEV(IF((A1:A10="")+MOD(A1:A10;2);"";A1:A10)) <Ctrl+Shift+Enter>
Figure 3-11 The end formula, in this case, needs <Ctrl+Shift+Enter> The E1:E10 range can now be deleted.
We can force a non <Ctrl+Shift+Enter> solution writing instead : =STDEV(IF(MMULT((A1:A10="")+MOD(A1:A10;2);1);"";A1:A10)) Adding a matrix multiplication by 1, will not change anything in the array, it’s just a trick to avoid <Ctrl+Shift+Enter>. The cost will be slower performance.
3.2.3 THE AGGREGATE FUNCTION The Aggregate function (Excel 2013 or latter) allows us to calculate averages, sums, maximums, minimums, etc., ignoring errors or hidden rows. It is similar to the Subtotal function, but with Aggregate we can choose what to ignore. Here is the syntax, for the array and reference form
Figure 3-12
The third argument can only be an array if <Ctrl+Shift+Enter>
function_num
>13, and most of the times, we don’t need to
3-23
Figure 3-13 function_num possible values
Figure 3-14 options possible values
Example, using Average and Aggregate with a range containing errors.
Figure 3-15 Use of Aggregate function to average A1:A10 values with the option to ignore errors. Note that, as we have an error in cell A7, Average returns the same error.
3.2.4 SMALLEST EVEN VALUE We can use the same formula as the one used in the StDev problem (Figure 37) just replacing Stdev by Min.=MIN(IF((A1:A10="")+MOD(A1:A10;2);"";A1:A10)) <Ctrl+Shift+Enter> Let’s do it again using the Aggregate function. Step 1 – Obtain an array with zeros and ones, 0 if odd or empty, 1 if even. We’re going to use a condition similar to the one for the Stdev problem =--NOT((A1:A10="")+MOD(A1:A10;2)) this will return the same as =--(A1:A10<>"")*(MOD(A1:A10;2)=0)
3-24
Figure 3-16 Array returning zero for odd values and empty cells and one for even values
Step 2 – Avoiding the If - remember with the If we need to <Ctrl+Shift+Enter> To do that we’re going to divide the A1:A10 values by the previous array. Obtaining, this way, the A1:A10 value for even values and the #DIV/0! error for odd values and empty cells. =A1:A10/NOT((A1:A10="")+MOD(A1:A10;2)) <Ctrl+Shift+Enter>
Figure 3-17 Array returning #DIV/0! for odd values/ empty cells and the corresponding A1:A10 values for even values
Step 3 – Use the Aggregate function with this array as third argument, function_num = 15 (SMALL), options = 6 (ignore errors) =AGGREGATE(15;6;A1:A10/NOT((A1:A10="")+MOD(A1:A10;2));1) <Enter>
Figure 3-18 Final formula for the smallest even A1:A10 value, no need to <Ctrl+Shift+Enter> The range E1:E10 can now be deleted
This solution cannot be used for the StDev problem because the array, used as third argument, cannot be used with function_num<14.
3-25
Timing both solutions =MIN(IF((A1:A10="")+MOD(A1:A10;2);"";A1:A10)) <Ctrl+Shift+Enter> AND =AGGREGATE(15;6;A1:A10/NOT((A1:A10="")+MOD(A1:A10;2));1)
< Enter>
Time was measure, several times, by Time It, using a big range (A1:A5000). Figure 3-19 shows the obtained results.
Figure 3-19 No big difference between the two options, but the Min solution was always slightly faster.
Min/Max are, on their own, much faster than the equivalent Small/Large Aggregate, and Sum is much faster than Sumproduct, they should not be replaced outside array solutions. In an array solution, as in the previous example, the weight of Aggregate, in the total formula time, is very small. In the end, the substitution will not significantly decrease the formula performance. We time Max vs Aggregate (used as Large, ignore errors) Sum vs Sumproduct in a non-array situation, here are the results for a big range (A1:A5000)
Figure 3-20 Max vs Aggregate
Figure 3-21 Sum vs Sumproduct
3-26
4 AUXILIARY TECHNIQUES We are going to show some important techniques that can be useful as part of some, more complex, problems.
4.1 POSITION OF THE LAST NON-EMPTY CELL IN A RANGE Sometimes is useful to obtain the position of the last non-empty cell in a range. We’re going to show different technics depending on the situation.
4.1.1 NO EMPTY CELLS BETWEEN DATA. This is the simplest solution, just count the non-empty cells using Counta. works with any type of mixed data. Data in B2:B11 =COUNTA(B2:B11)
Figure 4-1
4.1.2 EMPTY CELLS BETWEEN DATA 1. Only text data Excel is not case sensitive when directly compares text. All letters (including accented ones) and most of all other characters are smaller than “Z”. Lookup functions, when used for approximate matches, with lookup_value greater than any in the list, always return the last non-empty element in the list, even if the list is not ordered. For Match this will be the position (row number), and we should use 1 or omit the last argument (approximate match). If we have mixed (text and numeric) data this will be the position for the last text value, even if we have a numeric after that. =MATCH("ZZZZZ";B2:B11) – Lookup_value =”ZZZZZ” because it is not likely text to start with more than 5 ”, you can increase this or even use the Rept function for 255 “Z” REPT(“Z”;255)
4-27
Figure 4-2 Mary, position 6 in range B2:B11, is the last text value in the range.
2. Only numeric data Where going to apply the same technique, but in this case looking for a very big numeric value, for instance 9.9E307 (it is not the biggest possible numeric Excel value but is short and close enough) =MATCH(9.9E+307;B2:B11)
Figure 4-3 Value 9, position 6 in range B2:B11, is the last numeric value in the range.
4-28
3. Mixed (numeric and text data) The formula =MATCH("ZZZZZ";B2:B11) returns the last text value even if we have a numeric value after, but will return an error if we only have numeric values. The formula =MATCH(9.9E+307;B2:B11) returns the last numeric value even if we a have a text value after, but will return an error if we only have text values.
Figure 4-4 Value 5, position 6 in range B2:B11, is the last numeric value in the range. (text value â&#x20AC;&#x153;Maryâ&#x20AC;? is the last but was ignored) As there is no numeric values in C2:C11, the same formula now returns the #NA error.
We can combine both solutions to solve the mixed (text and numbers) problem. =MAX(IFERROR(MATCH(9.9E+307;C2:C11);0);IFERROR(MATCH("ZZZZZ";C2:C11);0))
Figure 4-5 Both formulas return the correct position for the last value (text or number) For the range C2:C11 the first Match returns an error, we use IfError to return 0 instead.
This formula will not work if we have, for instance, a Boolean value as last value. 4. Mixed data (text, numeric and other) It is unlikely that we have another type of data, dates, for instance, are treated as numbers so the previous formulas will work. If we have Booleans we need an array solution. You should avoid this solution because, as we will see later, it is many times slower than the previous ones.
4-29
5. Step1 - Create an array with only zeros and ones, zero if the cell is empty, one otherwise. =--(B2:B11<>"")
Figure 4-6 0 – empty cell 1 – non-empty cell We need the position for the last 1
Step 2 – As zero is a numeric value we’re going to replace it by an error. Let’s divide 1 by this array =1/(B2:B11<>"") (Division by zero returns an error)
Figure 4-7 #DIV/0 – empty cell 1 – non-empty cell We need the position for the last 1
Step 3 – The array now only contains error and ones, if we use match to lookup a value greater than 1, we will obtain the position for the last value, errors will be ignored. =MATCH(2;1/(B2:B11<>"")) <Ctrl+Shift+Enter>
Figure 4-8 The position for the last value in a list. You can now delete the C2:C11 range
4-30
Timing all the solutions We are going to use Time It to measure the previous techniques with a big range of data. A1:A5000 contains only numeric data and empties, B1:B5000 contains text data and empties, C1:C2 contains only text values, row 5001 is empty. All formulas should return 5000.
A B C D
Figure 4-9 Timing the solutions
E
As you can see the array solution (E) should be avoided at all cost, even (D) the solution combining B and C, is about 360% faster. (D) can be used with mixed text and numbers, it will work for most situations. The Counta solution (A) is significantly slower than solutions B and C, at least for big ranges. Even if you don’t have empty cells between data, if data is only text or only numbers, (B) or (C) will be more efficient. Testing, several times, (A) and (C) with the C1:C5010 range, reveals (C) to be about 65% faster. When tested again with a small range C1:C20, no significant differences were found.
4.2 POSITION OF THE FIRST EMPTY CELL We only need to use Match to obtain the position for the first empty cell, but unfortunately, we need an array solution. Step 1 – Obtain a range with TRUE and FALSE values, TRUE if the cell is empty, FALSE if not. =B2:B11=”” <Ctrl+Shift+Enter> Figure 4-10 Array in E2:E11 TRUE – cell is empty FALSE – not empty
4-31
Step 2 - Use Match to find the position the first TRUE =MATCH(TRUE;B2:B11="";0) <Ctrl+Shift+Enter>
Figure 4-11 The position of the first empty cell in a range (D2) The same in D4, a slightly different formula not needing <Ctrl+Shift+Enter>
The formula in D4 is a variant from the one in D2 and will not need <Ctrl+Shift+Enter> =MATCH(TRUE;INDEX(B2:B11="";0);0) Normally, formulas containing arrays, returned by Index, don’t need <Ctrl+Shift+Enter>. One possible exception is when we need an IF to build the array. Remember, zero as second argument of index means we want all the rows. So B2:B11=”” and INDEX(B2:B11=””;0) return exactly the same array, but with Index we don’t need to <Ctrl+Shift+Enter>. In this case, avoiding <Ctrl+Shift+Enter>, comes with a cost, it will be, for a big array, about 15% slower.
4.3 RETURNING ROW NUMBERS In many problems, we need to obtain row number for each cell in a range. As an example, we’re going to obtain an array containing row numbers for the B2:B11 range. (numbers 1 to 10, row relative to the beginning of the range) We can use the formula: =ROW(B2:B11)-ROW(B2)+1 <Ctrl+Shift+Enter> This will return row(B2)-row(B2)+1 »1, row(B3)-row(B2)+1 »2, … , row(B11)-row(B2)+1 »10
Figure 4-12 Array with row numbers for B2:B11 range
4-32
4.4 RETURNING ROW NUMBERS IN REVERSE ORDER To return the row number in reverse order we can use the formula: =ROW(B11)-ROW(B2:B11)+1 <Ctrl+Shift+Enter>
Figure 4-13 Array with row numbers for B2:B11 range in reverse order
4.4.1 LIST OF VALUES IN REVERSE ORDER Using the last formula with the index function will return B2:B11 range in reverse order: =INDEX(B2:B11;ROW(B11)-ROW(B2:B11)+1) <Ctrl+Shift+Enter>
Figure 4-14 Array containing B2:B11 values in reverse order
4.5 ARRAY CONSTANTS We can create array constants (arrays where we explicitly refer the list of elements)
4.5.1 ARRAY CONSTANT CONTAINING DIGITS 0 TO 9 Remember when, in Figure 2-12, we pressed F9 key to evaluate the array -(LEFT(B2:B11)=RIGHT(B2:B11)), obtaining {1;0;0;0;0;0;0;0;0;0} ? So, to create a vertical array containing digits 0 to 9 we can write: {0;1;2;3;4;5;6;7;8;9} and can now be used in a formula to return those values ={0;1;2;3;4;5;6;7;8;9} «-- “<Ctrl+Shift+Enter> We can obtain the same result, without using a constant, with the array formula:
4-33
=ROW(1:10)-1 <Ctrl+Shift+Enter> The constant solution is expected to be slightly faster.
Figure 4-15 In D1:D26 – returning “0” to “9” digits with an array constant
4.5.2 ARRAY CONSTANT CONTAINING LETTERS “A” TO “Z” Literal text must be inside double quotes so we can write it like this: ={"A";"B";"C";"D";"E";"F";"G";"H";"I";"J";"K";"L";"M";"N";"O";"P";"Q";"R";"S";"T";"U";"V";" W";"X";"Y";"Z"} or using the much shorter, but about 10% slower, array formula: =CHAR(ROW(65:90)) (65 and 90 are, respectively, the codes for letter “A” and “Z”) Both formulas need <Ctrl+Shift+Enter>
Figure 4-16 In B1B26 - returning “A” to “Z” letters with an array constant In D1:D26 – returning “A” to “Z” letters with an array formula
4-34
4.6 NAMING A FORMULA We can give a name to a formula the same way we can give a name to a range. Step 1 – Select and copy the text from Figure 3-10 formula (the array containing A1:A10 even values) =IF((A1:A10="")+MOD(A1:A10;2);"";A1:A10) Select the formula destination cell, for instance, C2 (important) Step 2 – Go to Formulas – Name Manager
Figure 4-17
Step 3 – Press <New> button EvenValues as <name> paste the text formula to <Refers to>
Figure 4-18
If you don’t want the formula to adapt when copied, just replace relative references by absolute references, A1:A10 by $A$1:$A$10 . When using relative references, the formula will adapt based on the cell selected when we enter name manager, in this case C2. Press <OK>
4-35
Figure 4-19
Press <Close> Step 4 – Now we can replace the Figure 3-11 formula =STDEV(IF((A1:A10="")+MOD(A1:A10;2);"";A1:A10)) by =STDEV(EvenValues) As the array part of the formula is part of the name, we don’t even need to <Ctrl+Shift+Enter>
Figure 4-20 No need to <Ctrl+Shift+Enter>
4-36
4.7 NAMING FIRST WORD AND LAST WORD FORMULAS Step 1 – Copy the text from Figure 2-4 formula =IFERROR(LEFT(B2;FIND(" ";B2)-1);B2) select C2 cell, go to Name Manager, paste the formula to <Refers to> and give it the name FirstWord Step 2 – Copy the text from Figure 2-8 formula =IFERROR(RIGHT(B2;LEN(B2)FIND("#";SUBSTITUTE(B2;" ";"#";LEN(B2)-LEN(SUBSTITUTE(B2;" ";"")))));B2) select D2 cell, go to Name Manager, paste the formula to <Refers to> and give it the name LastWord
Figure 4-21 Creating the FirsWord and LastWord formula names.
Step 3 – In cells C2 and D2 write, respectively, =FirstWord and =LastWord Copy the formulas down, as we used relative references the formulas will adapt.
Figure 4-22 Using FirstWord and LastWord named formulas.
Don’t forget that we create the name LastWord, selecting first cell D2. The formula expects the name to be two columns left (B2). To use the formula =LastWord, anywhere in the worksheet, the formula should be written two columns to the right from the original cell containing the name. These formulas will be recognized in other worksheets if the scope <option> (see Figure 3-13) is Workbook, but will refer always cells in the original worksheet. That means that they will not work correctly if used in another worksheet, they expect the cell, containing the name, to be in the original worksheet. You can solve this changing the <scope> to worksheet and create the same names, for every worksheet where you want to use the formula. For instance, if you want to use LastWord in both Sheet1 and Sheet2, create the name LastWord in Sheet1 with <scope> worksheet, and create it again in Sheet2. Now we have two named formulas with the same name but with different scopes.
4-37
5 THE MOST FREQUENT VALUE A very common problem is to determinate the most frequent or most frequents (can be more than one) value in a given range. We include solutions for the most frequent character in a text string.
5.1 RANGE WITH NUMERIC VALUES If the range contains only numeric values, we can use the Mode or Mode.Mult functions directly.
5.1.1 One MODE Numeric values in A1:A5 =MODE(A1:A5)
Figure 5-1 Mode function
5.1.2 MORE THAN ONE MODE If we have more than one most frequent value, we can use the Mode.Mult array function. Numeric values in A1:A8. If the answer is more than one value we must select first the destination range, for instance C2:C4, write the formula: =MODE.MULT(A1:A8) <Ctrl+Shift+Enter>
Figure 5-2 Mode.Mult function returning 5 an 3 as equal most frequent values In this case Mode.Mult returns a vertical array with 2 elements, as we select a range with 3, the third cell returns an error.
What can we do to avoid that error? Instead of selecting the destination range, weâ&#x20AC;&#x2122;re going to obtain each element of Mode.Mult (using index), one at a time, and then copy the formula down. To avoid the error, we are going to use IfError.
5-38
At C2 we write: =IFERROR(INDEX(MODE.MULT($A$1:$A$8);ROWS($C$2:C2));"") <Enter> and then we can copy down till C4. When we obtain an empty cell (“”) means no more modes. Note that ROWS($C$2:C2), second argument of index, when copied down, adapts to 1,2,3..., obtaining that way, the first mode value, the second mode value, etc.
Figure 5-3 Obtaining all the Mode values, one at a time.
To obtain the number of mode values we can write: =ROWS(MODE.MULT(A1:A8))
Figure 5-4 How many mode values. We have two most frequent values 5 and 3.
5.2 RANGE WITH MIXED VALUES (NOT ONLY NUMERIC) 5.2.1 ONE MODE Mixed values in A1:A8 range Step 1 – Create an array containing the position for the first match of each A1:A8 value. Each time we have the same value in A1:A8, the position will be the same. The most repeated position will correspond to the most frequent value. Select the D1:D8 range and write:
5-39
=MATCH(A1:A8;A1:A8;0)
<Ctrl+Shift+Enter>
Figure 5-5 Array containing positions (first match) for all the A1:A8 values. The most frequent value in the array corresponds to the most frequent A1:A8 value.
Step 2 – Use Mode to obtain the most frequent position and then index to obtain the corresponding value. As the array part of the formula was used with Mode with don’t even need to <Ctrl+Shift+Enter> =INDEX(A1:A8;MODE(MATCH(A1:A8;A1:A8;0))) <Enter>
Figure 5-6 “Mary” is the most frequent value in A1:A8 range. You can now delete the D1:D8 range
5.2.2 MORE THAN ONE MODE Mixed values in A1:A8 range with two or more most frequent values Step 1 – The same as the previous problem =MATCH(A1:A8;A1:A8;0) <Ctrl+Shift+Enter>
Figure 5-7 Array containing positions (row number) for all the A1:A8 values. Now we have two most frequent positions 2 and 5.
5-40
Step 2 – Repeat Figure 5-3 method used for numeric values (our array now contains numeric row number positions). Write the formula at C2, copy down till C4. Note that now we need two Index functions, the first to obtain, one at a time, the most frequent positions, the second to obtain the corresponding value in A1:A8. Once again we don’t need <Ctrl+Shift+Enter> =IFERROR(INDEX($A$1:$A$8;INDEX(MODE.MULT(MATCH($A$1:$A$8;$A$1:$A$8;0));ROWS($ C$2:C2)));"") <Enter>
Figure 5-8 “Mary” and “Paul” are the most frequent values in A1:A8 range. You can now delete the D1:D8 range
5-41
6 TEXT STUFF We continue the pursue for the most frequent value, now in text strings.
6.1 MOST FREQUENT CHARACTER 6.1.1 MOST FREQUENT CHARACTER, CASE SENSITIVE Text in cell A1 Step 1 – Obtain an array with all the positions in text string, from 1 to the length of the string. We’re going to obtain a range with the same size as the string, and then use the Row function to obtain numbers 1, 2, 3, 4… till the length. The range will be created using the Indirect function. Indirect receives a text string representing a reference and returns, if possible, the actual reference. Write the formula in D1:D20
Figure 6-1 An array with numbers from 1 to the length of A1 string
=ROW(INDIRECT("1:" & LEN(A1)))
<Ctrl+Shift+Enter>
Step 2 – Use Mid to access each individual character in the string, returning an array with each letter in a different row.
6-42
=MID(A1;ROW(INDIRECT("1:" & LEN(A1)));1)
<Ctrl+Shift+Enter>
Figure 6-2 A vertical array with A1 characters
Step 3 - Transform each character in the corresponding ASCII code =CODE(MID(A1;ROW(INDIRECT("1:" & LEN(A1)));1)) <Ctrl+Shift+Enter>
Figure 6-3 A vertical array with codes for all A1 characters.
6-43
Step 4 – Now we only have numeric values, we can use the Mode function to obtain the most frequent code and then transform it back to character. And again, as the array part of the formula is used by Mode, we don’t need to <Ctrl+Shift+Enter>. Formula in C1 =CHAR(MODE(CODE(MID(A1;ROW(INDIRECT("1:" & LEN(A1)));1)))) < Enter>
Figure 6-4 Formula returning the most frequent A1 character, in this case it is a space
If we have more than one mode, and want to return all, we should use Mode.Mult and the techniques for numeric values, showed back in Figure 5-3
6.1.2 MOST FREQUENT CHARACTER, CASE SENSITIVE, IGNORING SPACES Instead of directly using A1 text, we’re going to use Substitute to remove the spaces (replacing them by””). In the last formulas, the reference to A1, must be replaced by SUBSTITUTE(A1;” “;””) again we don’t need to <Ctrl+Shift+Enter> =CHAR(MODE(CODE(MID(SUBSTITUTE(A1;" ";"");ROW(INDIRECT("1:" & LEN(SUBSTITUTE(A1;" ";""))));1))))
Figure 6-5 Formula returning the most frequent A1 character ignoring spaces. In this case it´s a “,” and an “e”. it shows the first “,”
6.1.3 MOST FREQUENT CHARACTER, NOT CASE SENSITIVE, IGNORING SPACES Substitute is case sensitive, If we don’t want to be case sensitive we must replace the first A1 by UPPER(A1). The second A1 does not need to be replaced because LEN(A1) = LEN(UPPER(A1)) =CHAR(MODE(CODE(MID(SUBSTITUTE(UPPER(A1);" ";"");ROW(INDIRECT("1:" & LEN(SUBSTITUTE(A1;" ";""))));1))))
Figure 6-6 Formula returning the most frequent A1 character, not case sensitive and ignoring spaces. In this case “T” is the answer, we have two uppercase “T” and one lowercase. (3 total)
6-44
6.1.4 MOST FREQUENT “A” TO “Z” LETTER, NOT CASE SENSITIVE In this case, we want to ignore everything but letters “A” to “Z”. Substitute solution Step 1 – Create an array constant with all the uppercase letters, name it ALETTERS ={"A";"B";"C";"D";"E";"F";"G";"H";"I";"J";"K";"L";"M";"N";"O";"P";"Q";"R";"S";"T";"U";"V";" W";"X";"Y";"Z"} remember, alternatively, we can use the array formula: =CHAR(ROW(65:90)) Step 2 – Obtain a vertical array with all the “A” to “Z” letters from A1 cell removed. First element A1 with “A” removed, second with “B” removed…last with “Z” removed. The array will contain 26 elements, corresponding to the 26 letters. As we don’t want to be case sensitive, we should use UPPER(A1) instead of A1. Write it in D1:D26 =SUBSTITUTE(UPPER(A1);ALETTERS;"") <Ctrl+Shift+Enter>
Figure 6-7 Array with all “A” to “Z” letter removed, one at a time
6-45
Step 3 – Obtain the length of each one of those strings =LEN(SUBSTITUTE(UPPER(A1);ALETTERS;"")) <Ctrl+Shift+Enter>
Figure 6-8 The length of the remaining string after removing the “A” to “Z” letters one by one.
The smallest value position corresponds to the most frequent letter. For languages with accented characters like “É” or “Á”, if you want to treat them as letters, you should include the uppercase version of those letters in the ALETTERS array. In that case “E” and “É” will be treated as different letters. Step 4 - Use Match to calculate that position. Copy the text from the above formula and write in C1: =MATCH(MIN(LEN(SUBSTITUTE(UPPER(A1);ALETTERS;"")));LEN(SUBSTITUTE(UPPER(A1);A LETTERS;""));0) <Ctrl+Shift+Enter>
Figure 6-9 The position of the smallest value corresponds to the most frequent letter. 9th letter of the alphabet “I” The D1:D26 range can now be deleted.
6-46
Step 5 – Obtain the letter, we can use Index with the ALETTERS array, or use CHAR(64 + MATCH) =INDEX(ALETTERS;MATCH(MIN(LEN(SUBSTITUTE(UPPER(A1);ALETTERS;"")));LEN(SUBSTITUTE(UPPER(A1);ALETTERS;"") );0))”<Ctrl+Shift+Enter>
Figure 6-10 “I” is the most frequent letter in A1 text. Note that we have more “$” than “I”, but the formula ignores everything but “A” to “Z” letters.
Mode Solution Step 1 – Match all A1 characters, against the ALETTERS array, using an approximate match. As ALETTERS contains all uppercase letters in alphabetic order, and almost all other symbols evaluate as smaller than “A”, Match will return an error for non-letters characters. All characters evaluated as greater than “Z” will match the last element in the array “Z”. Depending on the language, it will be only one or two strange symbols, not likely to be found on phrases. For languages with accented characters Match will return the position, in ALETTERS, for the correspondent non-accented letter. For instance, if text contains an “É”, Match will return the position of “E”. Match is not case sensitive, all lowercase or uppercase letters, accented or not, will return the position of the correspondent, nonaccented, uppercase letter. If we want to ignore the accented letters, we should use an exact match (0 as last argument). The returned array will be the size of the A1 string, containing positions for each A1 letter. Each time we have the same letter the same position will be returned. In the end, the most frequent position, will correspond to the most frequent A1 letter. =MATCH(MID(A1;ROW(INDIRECT("1:" & LEN(A1)));1);ALETTERS) <Ctrl+Shift+Enter>
Figure 6-11 An Array with the position of each A1 letter inside ALETTERS. All non-letters characters return an error.
6-47
Step 2 – Replace the error by something ignored by Mode, for instance empty text (“”). We will use IfError. =IFERROR(MATCH(MID(A1;ROW(INDIRECT("1:" & LEN(A1)));1);ALETTERS);"") <Ctrl+Shift+Enter>
Figure 6-12 An Array with the errors replaced by an empty string “”
Step 3 – Use Mode to obtain the most frequent position. Because of IfError we will need to <Ctrl+Shift+Enter> =MODE(IFERROR(MATCH(MID(A1;ROW(INDIRECT("1:" & LEN(A1)));1);ALETTERS);""))
Figure 6-13 Mode returning the most frequent position
Step 4 – Use of Index to return the ALETTERS corresponding letter. =INDEX(ALETTERS;MODE(IFERROR(MATCH(MID(A1;ROW(INDIRECT("1:" & LEN(A1)));1);ALETTERS);""))) <Ctrl+Shift+Enter>
Figure 6-14 “I” is the most frequent “A” to “Z” letter, all non-letter characters ignored.
Timing the solutions In the first solution (SUBSTITUTE) the created array is always the size of ALETTERS, it will not depend on A1 length. For the second solution (MODE) the created array is the same size as A1 length. We can expect the first solution to be more efficient when A1 length increases. Figure 6-15 shows Time It results for the small text used in the example
6-48
Figure 6-15
And in Figure 6-16 for a 150 characters’ length text
Figure 6-16
As expected, the time difference increases when A1 length increases. But don’t forget, the Mode-Match solution is the only one that will treat an accented and the corresponding non-accented letter as being the same. If we have accented letters, the two solutions, can return different results. To return the same, the Mode-Match solution should be used with an exact match. (last argument of Match = 0) Using an exact match will even show a bigger time difference between solutions.
6.2 NUMBER OF DISTINCT CHARACTERS We’re going to find out the number of distinct characters for different scenarios.
6.2.1 NUMBER OF DISTINCT CHARACTERS, SPACE INCLUDED Count/Find solution Step 1 – Obtain the position of all characters (code 32 to 255) inside A1 string. Find function. If the character does not exist in A1 Find will return an error. The Find function is case sensitive, lower and uppercase letters will count as different characters. In the end of this chapter we will try and time another solution. =(FIND(CHAR(ROW(32:255));A1)) <Ctrl+Shift+Enter>
Figure 6-17 A partial view of the returned array, as we start at code 32 (space), the first value (3) is the position of the first space in A1. Note that the array end at row 224.
6-49
Step 2 – Use of Count to count the number of numeric values. Count will ignore errors. =COUNT(FIND(CHAR(ROW(32:255));A1)) <Ctrl+Shift+Enter>
Figure 6-18 Number of A1 distinct characters.
Substitute solution Step 1 - Obtain, for all the A1 Characters, the frequency of each one inside A1 =LEN(A1)-LEN(SUBSTITUTE(A1;MID(A1;ROW(INDIRECT("1:"&LEN(A1)));1);""))
<Ctrl+Shift+Enter>
Figure 6-19 The frequency of each A1 character in A1 Note that, each time the same character appears, we obtain the same result.
Step -2 - As each repeated character appears with the same value we’re going to divide 1 by the results. For instance, as we have four spaces, the space will appear four times, each time with four as result. Dividing 1 by four and we obtain 0.25, as we have four spaces 4 x 0.25 = 1. That means that each individual character always sums one. =1/(LEN(A1)-LEN(SUBSTITUTE(A1;MID(A1;ROW(INDIRECT("1:"&LEN(A1)));1);""))) <Ctrl+Shift+Enter>
Figure 6-20 1/ (frequency of each A1 character)
6-50
Step 3 â&#x20AC;&#x201C; Sum the array to obtain the number of distinct characters. Copy the formula text and use it as argument of Sumproduct, write it in C2. The range D1:D17 can now be deleted. =SUMPRODUCT(1/(LEN(A1)LEN(SUBSTITUTE(A1;MID(A1;ROW(INDIRECT("1:"&LEN(A1)));1);"")))) <Enter>
Figure 6-21 Number of A1 distinct characters.
Timing the solutions Figure 6-22 shows Time It results for the small text used in the example
Figure 6-22
At first sight the substitute solution seams mush faster. But the size of the used array depends on the string size, we expect the difference to decrease as the size increases. And Figure 6-23 with a 150 character length string
Figure 6-23
As predicted, with a much bigger string, the count solution is faster. As a good all-around Count/Find is preferable and a more elegant solution.
6.2.2 NUMBER OF DISTINCT CHARACTERS, SPACE IGNORED As 32 is the code for the space character, just use ROW(33:255) instead of ROW(32:255) =COUNT(FIND(CHAR(ROW(33:255));A1)) <Ctrl+Shift+Enter>
Figure 6-24 Number of A1 distinct characters, space ignored
If you do not want to be case sensitive just replace A1 by UPPER(A1)
6-51
6.2.3 NUMBER OF DISTINCT “A” TO “Z” LETTERS, CASE SENSITIVE We’re only going to replace the CHAR(ROW(33:255)) by our ALETTERS array constant. ALETTERS contains all the “A” to “Z” letters. In order to be case sensitive, we must include the lowercase letters. We can include them in the array constant or add the uppercase count with the lower case count. =COUNT(FIND(ALETTERS;A1))+COUNT(FIND(LOWER(ALETTERS);A1)) <Ctrl+Shift+Enter> or =COUNT(FIND(CHAR(ROW(65:90));A1))+COUNT(FIND(CHAR(ROW(97:122));A1)) <Ctrl+Shift+Enter>
Figure 6-25 Number of A1 distinct “A” to “Z” letters, case sensitive. The “I” and “i” will count as two different letters
6.2.4 NUMBER OF DISTINCT “A” TO “Z” LETTERS, NOT CASE SENSITIVE We’re going to replace Find (case sensitive function) by Search (equivalent but not case sensitive). In this case as looking for “a” is the same as looking for “A”, we only need the original ALETTERS array constant. =COUNT(SEARCH(ALETTERS;A1)) <Ctrl+Shift+Enter>
Figure 6-26 Number of A1 distinct “A” to “Z” letters, not case sensitive. The “I” and “i” will count as only one letter
6.3 OBTAINING THE NTH WORD FROM TEXT 6.3.1 FORMULA SOLUTION We already saw solutions to obtain first (Figure 2-4) and last word (Figure 2-8), let us see how to obtain any word. As an example, we’re going to obtain the 3rd word in a text string (n=3). Text in A1 cell. The method only works if we have just one space between words. Step 1 – Add a space before A1, that way the same formula will work even for the first word, and the nth space will be immediately before the nth word. Replace the 3rd space, using Substitute, by a unique symbol (something we’re sure will not exist in the original string). As an example, we’re going to use “#”. =SUBSTITUTE(" " & A1;" ";"#";3) Figure 6-27 3rd space replaced by “#”, after adding a space at the beginning.
6-52
Step 2 – Use Find to get the position of “#” =FIND("#";SUBSTITUTE(" " & A1;" ";"#";3))
Figure 6-28 3rd space position, after adding a space at the beginning.
Step 3 Use the same formula to obtain the next space position (nth+1 -» in this case the 4th) In this case we’re going to add a space in the end, making sure it will work even for the last word, and the space will be always after the nth space. =FIND("#";SUBSTITUTE(A1 & " ";" ";"#";3)) Step 4 – Use Mid to obtain the word between the two positions. We will end up with a big formula but build of three identical parts. =MID(A1;FIND("#";SUBSTITUTE(" " & A1;" ";"#";3));FIND("#";SUBSTITUTE(A1 & " ";" ";"#";3))FIND("#";SUBSTITUTE(" " & A1;" ";"#";3)))
Figure 6-29 “any” is the 3rd word in A1
It is easy to see that, to obtain the nth word, we only need to replace 3 by n. For a value of n greater than the number of words, the formula will return an error. We can use, if we want, IfError to return an empty string instead.
6.3.2 VBA SOLUTION A versatile solution will include the option to choose the value of N. With that in mind the NWord UDF function was created. You can find here a complete VBA tutorial. The syntax is: NWord(Text; N; Clean) And the arguments: Text (required)
A text string
N (required)
The Nth word to return, if N<0 or N>number of words in Text, the last word will be returned
Clean (optional boolean)
If you want extra spaces to be removed before returning the word Omitted or FALSE - extra spaces will not be removed TRUE – spaces will be removed
6-53
VBA NWord function FUNCTION NWORD(BYVAL S AS STRING, BYVAL N AS INTEGER, OPTIONAL C AS BOOLEAN=FALSE) AS STRING DIM P AS INTEGER, V, L AS INTEGER IF C THEN S = APPLICATION.TRIM(S) END IF IF S = "" THEN NWORD = "" ELSE IF N > 0 THEN V = SPLIT(S, , N + 1) L = UBOUND(V) + 1 IF N > L THEN N = L NWORD = V(N - 1) ELSE L = INSTRREV(S, " ") IF L = 0 THEN NWORD = S ELSE NWORD = RIGHT(S, LEN(S) - L) END IF END IF END IF END FUNCTION
Figure 6-30 Using the NWord VBA function
6-54
6.4 TEXT SPLIT Let us see how to return all the words, one at each row. We want to get the first word in C1, second in C2, etc. We write the formula in C1, when we copy to C2 we want the formula to adapt and return the second word. The trick is to replace the 3, in the previous formula, by ROWS($C$1:C1) As we’re going to copy the formula down the reference to A1 must be absolute. In the end, we apply IfError to remove the error for the 5th word (we only have 4 words). =IFERROR(MID($A$1;FIND("#";SUBSTITUTE(" " & $A$1;" ";"#";ROWS($C$1:C1)));FIND("#";SUBSTITUTE($A$1 & " ";" ";"#";ROWS($C$1:C1)))-FIND("#";SUBSTITUTE(" " & $A$1;" ";"#";ROWS($C$1:C1))));"")
Figure 6-31 All A1 words. The first time we get an empty cell means no more words.
6.5 TEXT JOIN (WITH VBA ALTERNATIVE SOLUTION) If you have Excel an office 365 subscription you can use the new TextJoin function. The syntax is: TEXTJOIN(delimiter; ignore_empty; text1; [text2], …) And the arguments description as provided by Microsoft: delimiter (required)
A text string, either empty, or one or more characters enclosed by double quotes, or a reference to a valid text string. If a number is supplied, it will be treated as text.
ignore_empty (required)
If TRUE, ignores empty cells.
text1 (required)
Text item to be joined. A text string, or array of strings, such as a range of cells.
[text2, ...] (optional)
Additional text items to be joined. There can be a maximum of 252 text arguments for the text items, including text1. Each can be a text string, or array of strings, such as a range of cells.
Infortunately, if you do not have an office 365 subscription, you need to use VBA. I include a VBA short UDF function that will provide most of TextJoin functionalities.
6-55
VBA Textjoin function Function TextJoin(Deli, IgnBlank, M) As String Dim v, x, s As String If IsMissing(Deli) Then Deli = "" If IsMissing(IgnBlank) Then IgnBlank = False s = "" v=M If TypeName(v) = "Variant()" Or TypeName(v) = "Range" Then If Not IgnBlank Then For Each x In v s = s & x & Deli Next Else For Each x In v If x <> "" Then s = s & x & Deli End If Next End If s = Left(s, Len(s) - 1) TextJoin = s Else TextJoin = v End If End Function
6.5.1 INSERTING SPACES BETWEEN EACH LETTER We wrote the word “EXCEL” in cell A1, and use the Figure 6.2 formula to create an array with all the letters. MID(A1;ROW(INDIRECT("1:" & LEN(A1)));1) Now we use TextJoin to join the array elements back to a string, using a space as delimiter. =textjoin(" ";TRUE;MID(A1;ROW(INDIRECT("1:" & LEN(A1)));1) ) <Ctrl+Shift+Enter>
Figure 6-32 Use of TextJoin VBA function.
6-56
6.5.2 REMOVING IRREGULAR SPACING For some of these examples work, text cannot contain leading spaces, trailing spaces or more than one space between words. Sometimes we need to clean our text first. The Trim function can be used to do that, Trim removes leading/trailing spaces and even extra spaces between words. =TRIM(A1)
Figure 6-33 Use Trim to remove extra spaces.
6.6 NUMBER OF WORDS IN A TEXT COLUMN Step 1 – Create an array containing the number of words in each cell. Remove the spaces using the Substitute function. The original length minus the length after removing spaces will be the number of spaces. The number of spaces plus one will be the number of words. Names in B2:B11, select D2:D11, write the formula: =1+LEN(B2:B11)-LEN(SUBSTITUTE(B2:B11;" ";"")) <Ctrl+Shift+Enter>
Figure 6-34 Number of words in each row
<Ctrl+Shift+Enter> Step 2 – Copy the text from last formula, paste it as argument for the Sumproduct function, after this you can delete the D2:D11 auxiliary range. Formula in C2 =SUMPRODUCT(1+LEN(B2:B11)-LEN(SUBSTITUTE(B2:B11;" ";""))) < Enter
Figure 6-35 We have 22 words in B2:B11 range D2:D11 range can now be deleted
6-57
6.7 BIGGEST STRING IN A TEXT COLUMN Step 1 – Create an array with the length for all B1:B10 strings =LEN(B2:B11) <Ctrl+Shift+Enter>
Figure 6-36 Length for all B2:B11 names D2:D11 range can now be deleted
Step 2 – Calculate, in this array, the position for the biggest value =MATCH(MAX(LEN(B2:B11));LEN(B2:B11);0) <Ctrl+Shift+Enter>
Figure 6-37 A2:A11 row 5 contains the biggest name D2:D11 range can now be deleted
6-58
Step 3 – Use index to obtain the corresponding name =INDEX(B2:B11;MATCH(MAX(LEN(B2:B11));LEN(B2:B11);0))
<Ctrl+Shift+Enter>
Figure 6-38 A2:A11 biggest name
6.8 SUM OF SQUARES OF INDIVIDUAL DIGITS IN A NUMBER We are going to treat the A1 number as a string. Excel will convert the number to string if used with a text function. The A1 value can be numeric or text (containing digits). Step 1 – Create an array with all the digits. Select a range starting at D1 till the size of A1 string. =MID(A1;ROW(INDIRECT("1:" & LEN(A1)));1) <Ctrl+Shift+Enter>
Figure 6-39 A1 individual digits
Step 2 – Square each digit =MID(A1;ROW(INDIRECT("1:" & LEN(A1)));1)^2
<Ctrl+Shift+Enter>
Figure 6-40 Squaring all A1 digits
6-59
Step 3 – Sum all array elements, use the text from previous formula as argument of Sumproduct. =SUMPRODUCT(MID(A1;ROW(INDIRECT("1:" & LEN(A1)));1)^2) <Enter>
Figure 6-41 Calculating sum of square of digits for all the A1:A3 numbers. Write the formula in B1 and copy down till B3 The D1:D5 range can now be deleted
6.9 CREDIT CARD VALIDATION Most credit cards (VISA, MASTERCARD, AMERICAN EXPRESS, etc.) use the Luhn algorithm, also known as “Mod 10” algorithm, to validate card numbers. The last digit (check digit) is used to make sure that checksum MOD 10 = 0 Calculating the checksum Start with the first digit from the right (check digit) and move left, summing alternatively digit+2xdigit+digit+2xdigit+…. When doubling the digit, if the value exceeds 9 we should subtract 9. For example, number ending with …27738 --» 8 + 2x3 + 7 + 2x7 + 2 --» 8 + 6 + 7 + 14 + 2… as 14 exceeds 9, we must subtract 9 to this one --» 8 + 6 + 7 + 5 +2… If we are sure that the number of digits is an even value, we can go left to right, doubling the digits in odd positions. We can do that adding the necessary number of “0” in the beginning of the number. If the sum ends with “0” the number is valid otherwise is invalid. Implementation Step 1 – We are going to admit that the number of digits is always 16 (because it is an even number and the maximum number of digits we can find in a card). In the end we will add the extra zeros. Create an array with all A1 digits, write in D1:D16: =MID(A1;ROW($1:$16);1) <Ctrl+Shift+Enter>
Figure 6-42 All A1 digits
6-60
Step 2 – Double the digits in odd positions, we just sum the digit again for odd positions. =MID(A1;ROW($1:$16);1)+MOD(ROW($1:$16);2)*MID(A1;ROW($1:$16);1) <Ctrl+Shift+Enter>
Figure 6-43 Doubling the digits in odd positions
Step 3 – Subtract 9 to values >9 We’re going to end with a big formula but build of many identical parts. =MID(A1;ROW($1:$16);1)+MOD(ROW($1:$16);2)*(MID(A1;ROW($1:$16);1)(2*MID(A1;ROW($1:$16);1)>9)*9) <Ctrl+Shift+Enter>
Figure 6-44 Subtracting 9 to values >9
6-61
Step 4 – Add zeros to the beginning of the number, making sure the size is always 16 digits. We’re going to replace all occurrences of A1 by REPT("0";16-LEN(A1))&A1. Sum all the values in the array. Copy the formula text and use it as argument of Sumproduct, write it in B2. The range D1:D16 can now be deleted. =SUMPRODUCT(MID(REPT("0";16LEN(A1))&A1;ROW($1:$16);1)+MOD(ROW($1:$16);2)*(MID(REPT("0";16LEN(A1))&A1;ROW($1:$16);1)-(2*MID(REPT("0";16-LEN(A1))&A1;ROW($1:$16);1)>9)*9)) <Enter>
Figure 6-45 Checksum
Step 5 – If the sum ends with 0 we have a valid card number otherwise the number is invalid. The final formula will return TRUE for valid numbers and FALSE for invalid. (MOD 10) =MOD(SUMPRODUCT(MID(REPT("0";16-LEN(A1))&A1;ROW($1:$16);1) +MOD(ROW($1:$16);2)*(MID(REPT("0";16-LEN(A1))&A1;ROW($1:$16);1)-(2*MID(REPT("0";16LEN(A1))&A1;ROW($1:$16);1)>9)*9));10)=0 <Enter>
Figure 6-46 Checking A1:A4 card numbers.
Use this formula in Data Validation if you need to enter several card numbers by hand.(see chapter 11) Use this formula with Conditional Formatting to check a list containing many card numbers. (see chapter 12)
6-62
7 RANDOM VALUES WITHOUT REPETITIONS Based on a range of values we want to obtain a set of those values, in random order without repetitions.
7.1 EXTRACTED FROM A RANGE 7.1.1 USING AN AUXILIARY RANGE AND THE RANK FUNCTION Range B2:B11 contain a list of names, we want to obtain in E2:E11 all those names in random order. Step 1 â&#x20AC;&#x201C; fill the D2:D11 range with [0..1[ random numbers (Rand() function) =RAND() write in D2, copy down till D11
Figure 7-1 Rand() used to obtain random numbers
At this point I always ask my students if it is possible to obtain a repeated number. They always say that it is unlikely but can happen. Not so, we can be 100% sure that it will not happen. We do not have really random number in computers, we have a mathematical, more or less complex, code to generate the number. The code is feed with a number (seed) and will generate another (pseudo random) based on that seed. Figure 7-2 shows the process
Figure 7-2 Generation of Pseudo Random Numbers in computers
We call those numbers pseudo random because, given the same seed, we always obtain the same number. The last generated number will be used as seed for the next generation. The first seed is based on system date and time. Sooner or later the same seed will appear, and from that point on we have a repeated sequence. For a good random generator, the period till the same sequence repeats is very, very large, that is why is completely impossible to have a repeated value for any given excel range.
7-63
Step 2 â&#x20AC;&#x201C; Use the Rank function to rank each one of the random numbers against all of them. As we have 10 different numbers in C2:C11, we will obtain 10 different ranks (1 to 10). Write the formula in E2, copy down till E11 =RANK(D2;$D$2:$D$11)
Figure 7-3 All 1-10 numbers in random order obtained with Rank.
Step 3 â&#x20AC;&#x201C; Use Index to obtain the values in the corresponding positions =INDEX($B$2:$B$11;RANK(D2;$D$2:$D$11))
Figure 7-4 All B2:B11 values in random order
7-64
7.1.2 NO AUXILIARY RANGE This will be much more complex. If you can afford an auxiliary range use the previous, very simple method, instead. Suppose we already have two random names and want to obtain a third. We want to make sure that the third one is different from the previous two. Note that the first random name can be obtained by Index and Randbetween. So, admitting that we already have two, no repeated, names in D2 and D3. Step 1 – Obtain in E2:E3 an array containing, for all possible names, a zero or one. One if the name already exists in D2:D3, zero otherwise. We’re going to use Match. =MATCH($B$2:$B$11;$D$2:D3;0) <Ctrl+Shift+Enter>
Figure 7-5 Values not used Match returns #N/A
Step 2 – Replace the errors (corresponding to not yet used names) by corresponding row number, obtaining, this way, an array with not used positions. =(ROW($B$2:$B$11)-ROW($B$2)+1)*ISERROR(MATCH($B$2:$B$11;$D$2:D3;0)) <Ctrl+Shift+Enter>
Figure 7-6 Array with all positions not yet used (note that 3 and 7 were replaced by zero)
7-65
Step 3 â&#x20AC;&#x201C; To obtain the value for D4 (third random position) we need a random value between 1 and 8 (8 positions still not used) and then use Large (or the equivalent Aggregate) to obtain the corresponding large value. For instance, if the random value is 5, we want the 5th large value in the array (6). Note that for the next random we want a value between 1 and 7, then 1 and 6, etc. To accomplish that we will use as last argument of Aggregate (or Large) RANDBETWEEN(1; ROW($B$11)-ROW(B3)) (11-3 = 8) Write the formula in D4, it will be our next random position. Copy the text from last formula and use it as third argument for Aggregate, we can now delete E2:E11 range. =AGGREGATE(14;4;(ROW($B$2:$B$11)ROW($B$2)+1)*ISERROR(MATCH($B$2:$B$11;$D$2:D3;0)); RANDBETWEEN(1; ROW($B$11)ROW(B3))) < Enter>
Figure 7-7 Next random position, not corresponding, for sure, to D2:D3 names position
Step 4 â&#x20AC;&#x201C; Use Index to obtain the corresponding B2:B11 name =INDEX($B$2:$B$11;AGGREGATE(14;4;(ROW($B$2:$B$11)ROW($B$2)+1)*ISERROR(MATCH($B$2:$B$11;$D$2:D3;0));RANDBETWEEN(1; ROW($B$11)ROW(B3)))) < Enter>
Figure 7-8 Next random name, different from D2:D3 names
7-66
Step 5 – Write in D2, to obtain the first random name: =INDEX(B2:B11;RANDBETWEEN(1;ROWS(B2:B11))) Now copy the D4 formula up till D3 and down till D11
Figure 7-9 In D2:D9 random B2:B9 names without repetitions
7.1.3 VBA VERSATILE SOLUTION The above solutions need the original values to be in a column range and destination to be a cell, then we copy down the formula till we have the pretended number of random values. In a versatile solution we should be able to select any range as origin and any range as destination. With that in mind the RandV UDF function was created. The syntax is: RandV(Orig_Range; Rep) And the arguments Orig_Range (required)
A range or Array containing the original values, the returned values will be randomly extracted from this range/array. (it will return an error if the argument is just a cell or not a range/Array)
Rep (optional)
0 or ignored – no repetitions allowed 1 – repetitions allowed
7-67
VBA RandV function
FUNCTION RANDV(M AS VARIANT, OPTIONAL REP =0) AS VARIANT DIM V, NR AS INTEGER, NC AS INTEGER, NRD AS INTEGER, NCD AS INTEGER DIM P AS INTEGER, PR AS INTEGER, PC AS INTEGER, VOUT DIM I AS INTEGER, J AS INTEGER APPLICATION.VOLATILE ' FUNCTION WILL RECALCULATE IF ANYTHING CHANGES IN THE WORKSHEET V = M ' COPY ORIGINAL RANGE/ARRAY TO AN ARRAY
NR = UBOUND(V, 1) ' ORIGIN RANGE NUMBER OF ROWS/COLUMNS NC = UBOUND(V, 2) NR = NR * NC ' ORIGIN TOTAL NUMBER OF CELLS NRD = APPLICATION.CALLER.ROWS.COUNT ' DESTINATION RANGE NUMBER OF ROWS/COLUMNS
NCD = APPLICATION.CALLER.COLUMNS.COUNT IF NR < NRD * NCD THEN ' RETURN AN ERROR IF DESTINATION TOTAL NUMBER OF CELLS > ORIGIN TOTAL NUMBER OF CELLS
RANDV = CVERR(XLERRNA) ELSE RANDOMIZE ' INITIAL SEED FOR THE RANDOM NUMBER GENERATOR BASED ON DATE AND TIME REDIM VOUT(1 TO NRD, 1 TO NCD) ' RESIZE DESTINATION ARRAY FOR I = 1 TO NRD ' GENERATE A TOTAL OF NRD*NCD RANDOM NUMBERS FOR J = 1 TO NCD P = INT(1 + NR * RND) ' GET A RANDOM INTEGER NUMBER BETWEEN 1 AND NR PR = 1 + (P - 1) \ NC ' TRANSLATE TO ROW/COLUMN PC = 1 + (P - 1) MOD NC VOUT(I, J) = V(PR, PC) ' STORE THE VALUE IN POSITION (PR,PC) IN THE DESTINATION ARRAY (I,J) POSITION
IF REP=0 THEN ' NO REPETITIONS? V(PR, PC) = V(1 + (NR - 1) \ NC, 1 + (NR - 1) MOD NC) ' COPY THE LAST ONE TO THE RANDOM (PR,PC) POSITIONS
NR = NR - 1 ' DECREASE NR END IF NEXT NEXT Example of use: Range B3:C7 contains 10 distinct names, we want to obtain, in range G4:I5, 6 of those names at random and without repetitions. The same in range K4:L6 but now repetitions are allowed. Step 1 – Select the destination range (The size of this range will be the number of random values) Step 2 – With G4:I5 selected write: =RandV(B3:C7) <Ctrl+Shift+Enter> Second argument omitted (no repetitions) Step 3 – With K4:L6 selected write: =RandV(B3:C7;1) <Ctrl+Shift+Enter> Second argument = 1 (repetitions allowed) As RandV can also receive an array, you can use next formula to obtain Integer values between [10 50] =RandV(row(10:50)) <Ctrl+Shift+Enter> The number of random values will be the size of the selected, destination, range
See 7.2.3 function DistRandBet for a more efficient solution to return integer random values without repetitions
7-68
Figure 7-10 Use of VBA RandV function
7.2 DISTINCT INTEGER RANDOM NUMBERS BETWEEN TWO VALUES 7.2.1 BETWEEN 1 AND N Suppose we want in (A1:A10) 10 random numbers, between 1 and 20, without repetitions. Write in A1 (to obtain the first number) =RANDBETWEEN(1;20) and, following the Figure 7-9 logic, in A2 ROW($1:$20) will return an array with all possible [1..20] values =AGGREGATE(14;4;ROW($1:$20)*ISERROR(MATCH(ROW($1:$20);$A$1:A1;0)); RANDBETWEEN(1;20-ROWS($A$1:A1))) < Enter>
Figure 7-11 Distinct random numbers between 1 and 20
7.2.2 BETWEEN K AND N Suppose we want in (A1:A10) 10 random numbers, between 10 and 50, without repetitions. Write in A1 (to obtain the first number) =RANDBETWEEN(10;50) and, following the Figure 7-9 logic, in A2 ROW($10:$50) will return an array with all possible [10..50] values =AGGREGATE(14;4;ROW($10:$50)*ISERROR(MATCH(ROW($10:$50);$A$1:A1;0));RANDBETW EEN(1;41-ROWS($A$1:A1))) < Enter>
7-69
Figure 7-12 Distinct random numbers between 10 and 50
Note that in RANDBETWEEN(1;41-ROWS($A$1:A1))) the value 41 means (50-10+1)
7.2.3 VBA SOLUTION In a versatile solution, we should be able to select any range as destination and select bottom and top values With that in mind the DistRandBet UDF function was created. The syntax is: DistRandBet(Bottom; Top; Rep) And the arguments: Bottom (required)
The smallest integer possible to be returned
Top (required)
The largest integer possible to be returned
Rep (optional)
0 or ignored â&#x20AC;&#x201C; no repetitions allowed 1 â&#x20AC;&#x201C; repetitions allowed
7-70
VBA DistRandBet function FUNCTION DISTRANDBET(BYVAL L AS LONG, U AS LONG, OPTIONAL REP=0) AS VARIANT DIM NR AS INTEGER, NC AS INTEGER DIM V(), VOUT(), I AS LONG, J AS LONG, P AS LONG APPLICATION.VOLATILE NR = APPLICATION.CALLER.ROWS.COUNT NC = APPLICATION.CALLER.COLUMNS.COUNT IF U < L THEN P=L L=U U=P END IF REDIM V(L TO U) IF U - L < NR * NC - 1 AND REP = 0 THEN DISTRANDBET = CVERR(XLERRNA) ELSE REDIM VOUT(1 TO NR, 1 TO NC) RANDOMIZE FOR I = L TO U V(I) = I NEXT FOR I = 1 TO NR FOR J = 1 TO NC P = L + INT((U - L) * RND) VOUT(I, J) = V(P) IF REP = 0 THEN V(P) = V(U) U=U-1 END IF NEXT NEXT DISTRANDBET = VOUT END IF END FUNCTION
7-71
Example of use: Range B2:C7 distinct integer random numbers between 20 and 50. Range E2:G4 distinct integer random numbers between 90000 and 100000 Range B9:F10 integer random numbers between 12 and 20, repetitions allowed Step 1 – Select the destination range (The size of this range will be the number of random values) Step 2 – With B2:C7 selected write: =DistRandBet(20;50)
<Ctrl+Shift+Enter>
Step 3 – With E2:G4 selected write: =DistRandBet(90000;100000) <Ctrl+Shift+Enter> Step 4– With B9:F10 selected write: =DistRandBet(12;20;1) <Ctrl+Shift+Enter>
Last argument omitted (no repetitions)
Last argument omitted (no repetitions)
Last argument = 1 (repetitions allowed)
Figure 7-13 Use of DistRandBet VBA function
7-72
8 PUT THINGS IN ORDER WITH FORMULAS In this chapter, we going to learn how to use formulas to order a set of numeric/text data, and how to obtain Top N data.
8.1 ORDERING NUMERIC DATA 8.1.1 ASCENDING ORDER Range A1:A10 contains numeric values, we want the same values, in ascending order, in range C1:C10. Step 1 â&#x20AC;&#x201C; Use in C1 the Small function to obtain the smallest value. =SMALL($A$1:$A$10;1) < Enter>
Figure 8-1 The smallest value
Step 2- Replace Small second argument by Rows($C$1:C1) so it will adapt to 2, 3,4, etc. when copied down. =SMALL($A$1:$A$10;ROWS($C$1:C1)) < Enter> (write in C1 and copy down till C10)
Figure 8-2 C1:C10 contains now A1:A10 values in ascending order
8-73
To obtain all values at once with an array formula Select C1:C10 range and write: =SMALL($A$1:$A$10;ROW(A1:A10)-ROW(A1)+1) <Ctrl+Shift+Enter>
Figure 8-3 C1:C10 contains now A1:A10 values in ascending order
8.1.2 DESCENDING ORDER For the copy down solution replace =SMALL($A$1:$A$10;ROWS($C$1:C1)) by =LARGE($A$1:$A$10;ROWS($C$1:C1)) < Enter>
Figure 8-4 C1:C10 contains now A1:A10 values in descending order
8-74
For the all at once array formula solution, replace =SMALL($A$1:$A$10;ROW(A1:A10)-ROW(A1)+1) by =LARGE($A$1:$A$10;ROW(A1:A10)-ROW(A1)+1) <Ctrl+Shift+Enter>
Figure 8-5 C1:C10 contains now A1:A10 values in descending order
8.2 ORDERING TEXT DATA 8.2.1 USING AN AUXILIARY RANGE Range A1:A10 contains names, we want the same names, in ascending order, in range C1:C10. As Small and Large only works for numeric data, they cannot be used directly. Note: ordering non-numeric data with formulas will always be slow for big ranges, avoid it if possible. This is especially true for solutions with no auxiliary ranges. Weâ&#x20AC;&#x2122;re going to use B1:B10 as an auxiliary range. Step 1 - Write in B2: =COUNTIF($A$1:$A$10;"<" & A1) and copy down till B10, this will give you how many values are smaller than A1, how many are smaller than A2,â&#x20AC;Ś how many are smaller than A10.
Figure 8-6 Smaller the value, smaller the name in alphabetic order Note that, for repeated names, the value is the same
8-75
Step 2 – Transform previous array so we have no repeated values from 1 to 10, to do that we’re going to sum the number or equal values till that point. That means, if it is the first time we have a name, it will sum one, the second time we have the same name, it will sum two. That way the values 5, corresponding to the repeated name “John Osborne” will now be 6 and 7. Replace B2 formula by: =COUNTIF($A$1:$A$10;"<" & A1)+COUNTIF($A$1:A1;A1)
Figure 8-7 Distinct values 1 to 10 corresponding to the A1:A10 names alphabetic order
Step 3 – In C2 obtain the name corresponding to the smallest value (1) and then copy down till C10 =INDEX($A$1:$A$10;MATCH(ROWS($C$1:C1);$B$1:$B$10;0))
Figure 8-8 A1:A10 names in alphabetic order
8-76
We can replace the formula in B1:B10 by: =SUMPRODUCT(--($A$1:$A$10<A1))+SUMPRODUCT(--($A$1:A1=A1)) This will be slightly slower and returns exactly the same values for the previous data. In the case of data with a mix of text and numbers the first method fails but this one will still work.
Figure 8-9 Works with mixed data
8.2.2 NO AUXILIARY RANGE This will be significantly slower as the array will be recalculated each time we use the formula. Step 1 â&#x20AC;&#x201C; Obtain in D1:D10, B1:B10 values but all at once with an array formula reflecting the same basic logic. =COUNTIF($A$1:$A$10;"<"&$A$1:$A$10)+COUNTIF(INDIRECT("A1:A" &ROW($A$1:$A$10)ROW($A$1)+1);$A$1:$A$10) <Ctrl+Shift+Enter>
Figure 8-10 Obtaining the auxiliary range all at once
8-77
Step 2 â&#x20AC;&#x201C; Use this formula in C1 as the second argument of Match, and Index to obtain the name. Copy down till C10 =INDEX($A$1:$A$10;MATCH(ROWS($C$1:C1);COUNTIF($A$1:$A$10;"<"&$A$1:$A$10)+COUN TIF(INDIRECT("A1:A" &ROW($A$1:$A$10)-ROW($A$1)+1);$A$1:$A$10);0)) <Ctrl+Shift+Enter>
Figure 8-11 A1:A10 names in alphabetic order without an auxiliary range
8.3 TOP N VALUES AND CORRESPONDING VALUE IN ANOTHER COLUMN We want to show the Top N scores and corresponding student names based on an exam score.
8.3.1 MANUAL, NO FORMULAS SOLUTION There are many ways to manually obtain Top N data, Filters and Tables are probably the faster and easier ones to use. Note that these solutions need to be applied, by hand, again if data changes. Using Filters Select one of the A1:C11 cells and then Data-Filter
Figure 8-12
8-78
Click Column <Score> Filter Button, Number Filters â&#x20AC;&#x201C; Top 10
Figure 8-13
Choose Top 5 and click <OK>
Figure 8-14
You should obtain:
Figure 8-15
8-79
To order the data you can now use the <score> Filter button. Eventually it will be better to copy the data to another location, remove the filter and then order. If we order it in the original location the only way to go back is using <Undo>
Figure 8-16
Using Tables Tables are very useful because they dynamically adapt to new information. Later in this Chapter we will show an example of this. For this problem just transform the A1:C11 range to a Table. Select A1:C11, Insert-Tables-Table (or press Ctrl T)
Figure 8-17
8-80
Make sure â&#x20AC;&#x153;My table as headersâ&#x20AC;? is ticked
Figure 8-18
Press <OK>
Figure 8-19
Now repeat the same steps used in the Filter solution, till we obtain:
Figure 8-20
If, in the end, you want to remove the table, select a table cell, Right Click -Table-Convert to Range
8-81
8.3.2 NON-ARRAY FORMULA SOLUTION
Figure 8-21 Data and layout for the Top N problem
In cell G1 we can change value of N (from Top N) Step 1 â&#x20AC;&#x201C; Obtain in H2:H11 the Top N scores (in the example Top 5). Write in H2 the formula =MAX(C2:C11) to obtain the largest score. In H3 check if k score is less or equal to 5, obtain the large k score if it is, otherwise check if (k score) = (k-1 score) and return k-1 score, if not return empty string. Copy the formula down till H11. =IF(H2="";"";IF(ROWS($H$2:H3)<=$G$1;LARGE($C$2:$C$11;ROWS($H$2:H3));IF(LARGE($C$2 :$C$11;ROWS($H$2:H3))=H2;H2;"")))
Figure 8-22 Top 5 scores
8-82
Step 3 – Obtain corresponding student names. This will be more difficult because we can have students with the same score. If k score <> (k-1) score just obtain the name corresponding to that score match. For F5 that will be: INDEX($B$2:$B$11;MATCH(H5;$C$2:$C$11;0)) If the score is the same we’re going to obtain the score range immediately after (k-1) name. This way (k-1) score will not be included. Now we’re going to do another match, in this new range, to obtain the next position with the same score. Figure 8-23 explains how to obtain in F5 the 3th student with score 75% (Richard Glenn). This method only works if all the names are different.
Figure 8-23 Obtaining Richard Glenn the 3th student with score 75%
Step 4 –Write in F2 =INDEX($B$2:$B$11;MATCH(H2;$C$2:$C$11;0)) to obtain the first student name In F3 write the previous Figure 8-23 formula an then copy down till F11
=IF(H3="";"";IF(H3<>H2;INDEX($B$2:$B$11;MATCH(H3;$C$2:$C$11;0));INDEX($B$2:$B$11;MATCH(H3; INDEX($C$2:$C$11;MATCH(F2;$B$2:$B$11;0)+1):$C$11;0)+MATCH(F2;$B$2:$B$11;0))))
Note that we can change the G1 value and obtain immediately the
Figure 8-24 Top 5 Scores and corresponding student corresponding Top G1. names
8-83
8.3.3 USING A TABLE AND STRUCTURED REFERENCES And if we want everything to adapt when we enter new data after row 11? The simplest way is to convert the A1:C11 range to a Table. Follow Figure 8-17 to Figure 8-19 instructions. We should create the table first and then write the formulas using Structured References. If we write the formulas first, they will not update correctly. We’re going to change the Table Style, remove the Filter Buttons and rename the table to ”student” (Design Menu). After creating the table the formula in H2 should be =MAX(student[Score]) instead of =MAX(C2:C11). If we select C2:C11 excel will automatically create the Structured Reference. Note how much clear the formula is.
Figure 8-25 Creation of student Table
The formula in H2 should now be: =MAX(student[Score]) and in H3: =IF(H2="";"";IF(ROWS($H$2:H3)<=$G$1;LARGE(student[Score];ROWS($H$2:H3));IF(LARGE(st udent[Score]; ROWS($H$2:H3))=H2;H2;"")))--» copy down till H11 In F2: =INDEX(student[Name];MATCH(H2;student[Score];0)) and in F3: =IF(H3="";"";IF(H3<>H2;INDEX(student[Name];MATCH(H3;student[Score];0));INDEX(student[Name];MATCH(H3;O FFSET(student[Score];MATCH(F2;student[Name];0);0);0)+MATCH(F2;student[Name];0))))
In this formula, to avoid the direct reference to $C$11, INDEX($C$2:$C$11;MATCH(F2;$B$2:$B$11;0)+1):$C$11 was replaced by OFFSET(student[Score];MATCH(F2;student[Name];0). Note that this range is always the size of student[Score]. As we only use this part of the formula if the scores are equal, we will always get a match before the end of the table. This will avoid adding ROWS(student)-MATCH(F2;student[Name] as 4th argument of Offset. --» copy down till F11
8-84
Figure 8-26 Top 5 using a table
With Structured References everything adapts when a new table row is created.
8.3.4 ARRAY FORMULA SOLUTION For big arrays, this solution will be much slower than the non-array solution. Use it only in the rare case of having repeated names. Repeat Step 1 and Step 2 from previous solution, at this point we have the Top 5 scores. Step 3 â&#x20AC;&#x201C; write in F2 =INDEX($B$2:$B$11;MATCH(H2;$C$2:$C$11;0)) to obtain the first student name. Create an array containing all the row positions corresponding to the same score value. This array will replace the case of consecutive equal scores. Use the E2:E11 range to show the array based on H5 score. We have 3 scores equal to H5. =(ROW($C$2:$C$11)-ROW($C$2)+1)/($C$2:$C$11=H5) <Ctrl+Shift+Enter>
Figure 8-27 The array with row numbers corresponding to scores equal to H5
See that we have the 3 rows corresponding to score 75%, errors in all the other cells. Now we can use the Aggregate function [function_num=15 (small), options=6 (ignore errors)] to obtain each one of these positions. Write the formula in F3, replace H5 by H3, and copy down till F11. To know the one we should get, we count the number of equal scores till that point. COUNTIF($H$2:H3;H3) ÂŤ-- used as last argument of Aggregate.
8-85
=IF(H3="";"";IF(H3<>H2;INDEX($B$2:$B$11;MATCH(H3;$C$2:$C$11;0));INDEX($B$2:$B$11;AG GREGATE(15;6;(ROW($C$2:$C$11)-ROW($C$2)+1)/($C$2:$C$11=H3);COUNTIF($H$2:H3;H3))))) <Enter>
Figure 8-28 Top 5 Scores and corresponding student names. Repeated scores solved with an array formula
8.3.5 USING TABLES AND QUERIES Step 1 – Select A1:C11 data, create a table and name it “Student” Step 2 – With the table selected go to Data – Get & Transform - From Table The Query Editor will open
Figure 8-29 Query Editor
8-86
Step 3 - Use Remove Columns option to remove Number Column (the first column)
Figure 8-30
Notice the query default name is equal to the table name Query Editor after deleting Number column Step 4 – Press Close & Load – Close & Load To…
Figure 8-31 Load To options
Choose: Table Existing worksheet, F2 (new table will start at cell F2 in the same worksheet)
8-87
Press <LOAD>
Figure 8-32 Student_2 created containing query result
Step 5 – Note that a new table named “Student_2” was created. Format Score as percentage and change at will the Table Style. Remove, if you want, Filter Buttons
Figure 8-33 Sudent_2 after changing style
8-88
Step 6 – Change SQL text to obtain Top 5 Data – Connections
Figure 8-34 Workbook connections
Select Query – Student Press <Properties> - Definitions
Figure 8-35 SQL Command text
8-89
Change Command Text to:
Press <OK> Press <Close>
Figure 8-36 Changing SQL Command text
Your data should now look like this:
Figure 8-37 Top 5 Query result
If score changes in Student Table, we need to select one of the new Student_2 table cells and then Query - Refresh Or we can use a lithe VBA to do it automatically and an additional bonus to obtain Top N (changing G1 cell) Step 6 – Use the On Change event corresponding to the sheet where the Table is located (In my case Sheet1) – Developer – Visual Basic, select the correct Sheet, First Drop Down – WorkSheet, Second Drop Down – Change
Figure 8-38 Worksheet Change Event
The headers for the Worksheet Change Event will be automatically created. Now write between Private Sub and End Sub:
8-90
Figure 8-39 Complete Worksheet Change Event
Now, with new data or if G1 changes, the Student_2 table will be immediately updated.
Figure 8-40 Automatically refreshing the query and corresponding Student_2 table using Worksheet Change Event
8-91
8.4 CHECK IF A COLUMN RANGE IS ORDERED 8.4.1 ASCENDING ORDER Values of any type in range A1:A10, to be in ascending order A1<=A2, A2<=A3,…A9<=A10. Step 1 – Create a range (D1:D9) that checks if each element is smaller or equal to the next. =A1:A9<=A2:A10 <Ctrl+Shift+Enter>
Figure 8-41 Array checking if each element is smaller or equal to the next. As A1:A10 is ordered all values are TRUE
Step 2 – To be ordered this array must contain only TRUE values. We can apply the And logical function to this array. And will only return TRUE if all elements are TRUE. Formula in C2. =AND(A1:A9<=A2:A10) <Ctrl+Shift+Enter>
Figure 8-42 Formula returns TRUE if A1:A10 is in ascending order, FALSE otherwise.
8-92
Sumproduct and Aggregate solutions We can avoid <Ctrl+Shift+Enter> if we opt to use Sumproduct or Aggregate =SUMPRODUCT(--(A1:A9<=A2:A10))=ROWS(A1:A10)-1 <Enter> or =AGGREGATE(15;4;--(A1:A9<=A2:A10);1)=1 <Enter> (formulas in C2)
Figure 8-43 Sumproduct and Aggregate solutions Formula returns TRUE if A1:A10 is in ascending order, FALSE otherwise.
Timing solutions for a big (A1:A1000) range.
Figure 8-44
As you can see And is faster than Sumproduct or Aggregate solutions. But in this case we must be careful when using And, the formula will not return an error if we forget to <Ctrl+Shift+Enter>, it will return only the result of A1<=A2. That means that we can have a formula returning a wrong value and donâ&#x20AC;&#x2122;t even notice it.
8-93
8.4.2 ASCENDING, DESCENDING, OR NOT ORDERED. In this case, we want a formula to return (1) if A1:A10 elements are in ascending order, (-1) if in descending order and (0) if not ordered. You can think the solution as Or is in ascending Or is in descending Or not ordered. This solution will return (0 â&#x20AC;&#x201C; not ordered) if all elements are equal and it is not very efficient if the values are in ascending order. Why should we check for descending if it is already in ascending? =AND(A1:A9<=A2:A10)+-1*AND(A1:A9>=A2:A10) <Ctrl+Shift+Enter>
Figure 8-45 Checking Ascending/Descending/ Not ordered Formula returning (-1) the A1:A10 values are in descending order
Using an If The problems with the previous solution can be solved using an If function. =IF(AND(A1:A9<=A2:A10);1;IF(AND(A1:A9>=A2:A10);-1;0)) <Ctrl+Shift+Enter>
Figure 8-46 Using an IF to check Ascending/Descending/ Not ordered Formula returning (-1) the A1:A10 values are in descending order
8-94
9 OVERCOMING LOOKUP AND REFERENCE FUNCTIONS LIMITATIONS Lookup functions (Lookup, Vlookup, Match, etc) are very useful but they have two main limitations
9.1 ONLY ONE LOOKUP VALUE LIMITATION The first argument of all these functions is Lookup_value, we can only look for something based on one and just one Lookup_value, and we can only check for equality. Look at Figure 9-1 data, suppose that we want to find out the student name for the one belonging to class T1 with a 75% score.
Figure 9-1
Step 1 â&#x20AC;&#x201C; Create an array (F2:F11) with (1) and (0), (1) if it belongs to T1 and score 75%, (0) otherwise =(C2:C11="T1")*(D2:D11=0.75) <Ctrl+Shift+Enter>
Figure 9-2 Array returning (1) if class T1 and score 75%, (0) otherwise
9-95
Step 2 â&#x20AC;&#x201C; Now we can use the Match function using (1) as Lookup_value Copy the previous formula text and use it as Match second argument. We can now delete F2:F11 range =MATCH(1;(C2:C11="T1")*(D2:D11=0.75);0) <Ctrl+Shift+Enter>
Figure 9-3 Match returning the position for the first student belonging to T1and score 75%,
Step 3 â&#x20AC;&#x201C; Use Index to obtain the corresponding student name =INDEX(B2:B11;MATCH(1;(C2:C11="T1")*(D2:D11=0.75);0))
<Ctrl+Shift+Enter>
Figure 9-4 Index returning the name for the first student belonging to T1and score 75%,
9-96
We can use another method, concatenating Class and Score columns. Step 1 – Create an array with text values, the concatenation of Class and Score =C2:C11&D2:D11 <Ctrl+Shift+Enter>
Figure 9-5 The concatenation of Class and Score Columns
Step 2 – Now we can use the Match function using “T10.75” as Lookup_value Copy the previous formula text and use it as Match second argument. We can now delete F2:F11 range =MATCH("T10.75";C2:C11&D2:D11;0) <Ctrl+Shift+Enter>
Figure 9-6 Matching “T10.75”
9-97
Step 3 â&#x20AC;&#x201C; Use Index to obtain the corresponding student name =INDEX(B2:B11;MATCH("T10.75";C2:C11&D2:D11;0)) <Ctrl+Shift+Enter>
Figure 9-7 Index to obtain the corresponding student name
9.2 FIRST MATCH LIMITATION If there is more than one match, Lookup and Reference functions return only the first match. Look at the Figure 9-8 data, we want to obtain in G2:G6 the student names belonging to G1 class.
Figure 9-8
9-98
9.2.1 NON-ARRAY SOLUTION WITH AN AUXILIARY RANGE We’re going to use an auxiliary range (H2:H6) to obtain the corresponding row numbers (positions) for all the students belonging to G1 class. Step 1 – Obtain in H2 the first match =MATCH(G1;C2:C11;0)
Figure 9-9 Returning the first match for G1 class
Step 2 – As the first match returns 5, we want to do a second match starting at row 6 (C7:C11) This range should be created based on previous match. This way the formula can be copied down to return all the other matches. We can use Index or Offset to obtain the range INDEX($C$2:$C$11;H2+1):$C$11 or OFFSET($C$2;H2;0;ROWS($C$2:$C$11)-H2) Now match, in this new range, the G1 class. Write in H3 =MATCH($G$1;INDEX($C$2:$C$11;H2+1):$C$11;0)
Figure 9-10 Second Match
The formula returns (2), and it is correct, in range C7:C11 the first “T3” appears in row 2. But we want positions based on C2 (beginning of original range). The only thing we need to do is to add the previous position (H2)
9-99
Step 3 – Add the previous position to Match result and copy the formula down till H6. =MATCH($G$1;INDEX($C$2:$C$11;H2+1):$C$11;0)+H2
Figure 9-11 All G1 matches, an error means no more matches
Step 4 - Use iferror to remove the errors, replace H3 formula by =IF(H2="";"";IFERROR(MATCH($G$1;INDEX($C$2:$C$11;H2+1):$C$11;0)+H2;""))
Figure 9-12 All G1 matches, errors replaced by empty space “”
9-100
Step 5 – Use Index in G2:G6 to obtain the corresponding student names. Use the formula in G2 and copy down till G6 =IF(H2="";"";INDEX($B$2:$B$11;H2))
Figure 9-13 All student names corresponding to G1 class
9.2.2 NON-ARRAY SOLUTION WITH NO AUXILIARY RANGE We can use the previous chapter technique for Top N described back in Figure 8-23 Step 1 –Write in G2 =INDEX($B$2:$B$11;MATCH($G$1;$C$2:$C$11;0)) to obtain the first student name Follow Figure 8-23 logic to write in G3 (we obtain the range used in the kth G1 match based on the match for (k-1)th name) =IF(G2="";"";IFERROR(INDEX($B$2:$B$11;MATCH($G$1;INDEX($C$2:$C$11;MATCH(G2;$B$2: $B$11;0)+1):$C$11;0)+MATCH(G2;$B$2:$B$11;0));""))
Figure 9-14 All student names corresponding to G1 class, no auxiliary range
The range used in the kth G1 match depends on (k-1)th name
9-101
9.2.3 ARRAY SOLUTION WITH NO AUXILIARY RANGE Step 1 – Create an array with TRUE if the class (C2:C11) is equal to G1, FALSE otherwise. Write in F2:F11 =$C$2:$C$11=$G$1 <Ctrl+Shift+Enter>
Figure 9-15 TRUE class, = G1 FALSE otherwise
Step 2 – Transform the value TRUE to the corresponding C2:C11 row number, FALSE to an error. This way we obtain an array with the row number of all values in C2:C11 equal to the G1 value. To obtain that we’re going to divide row number by the previous array, as FALSE is (0) we will obtain, in that case, an error. =(ROW($C$2:$C$11)-ROW($C$2)+1)/($C$2:$C$11=$G$1) <Ctrl+Shift+Enter>
Figure 9-16 Row number if class, = G1 error otherwise
9-102
Step 3 â&#x20AC;&#x201C; Now we can use the Aggregate function [function_num=15 (small), options=6 (ignore errors)] to obtain each one of these row numbers. The text, from previous formula, can now be copied and pasted as third argument. The F2:F11 range can now be deleted. Write in G2, copy down till G6 =AGGREGATE(15;6;(ROW($C$2:$C$11)-ROW($C$2)+1)/($C$2:$C$11=$G$1);ROWS($G$2:G2)) <Enter>
Figure 9-17 Row numbers for class, = G1
Step 4 â&#x20AC;&#x201C; Use Index to obtain the corresponding name and IfError to remove the errors. =IF(G1="";"";IFERROR(INDEX($B$2:$B$11;AGGREGATE(15;6;(ROW($C$2:$C$11)ROW($C$2)+1)/($C$2:$C$11=$G$1);ROWS($G$2:G2)));"")) <Enter>
Figure 9-18 Names for students in G1 class
We only need the if for the formulas in G3 and below. If the cell, containing the class, is other than G1, it should even be removed.
9-103
9.2.4 CONVERTING TO A TABLE AND USING STRUCTURED REFERENCES - Convert the range A1:D11 to a table, name it “Student” - Replace the G2 formula by: =IF(G1="";"";IFERROR(INDEX(Student[Name];AGGREGATE(15;6;(ROW(Student[Class])ROW(Student[#Headers]))/(Student[Class]=$G$1);ROWS($G$2:G2)));"")) «-- copy down till G6
Figure 9-19 Using a table and Structured References
Note how ROW($C$2:$C$11)-ROW($C$2)+1 was replaced by ROW(Student[Class])ROW(Student[#Headers]) to obtain the array with 1,2,3,… till last Student[Class] row. ROW(Student[#Headers]) is the row number corresponding to the Student Table Headers. Notice that we add a new student in class “T3” and the formulas adapt to include it. That’s one of the advantages of using Tables and Structured references.
9-104
10 DISTINCT VALUES Weâ&#x20AC;&#x2122;re going to show different methods to calculate the number of distinct values in a range, and the distinct values themselves.
10.1 NUMBER OF DISTINCT NUMERIC VALUES 10.1.1
FREQUENCY SOLUTION
Frequency calculates how often values (data_array) occur within a range of values (bins_array), and then returns a vertical array of numbers. This array contains the number of values in data_array that falls in the intervals represented in bins_array. The values in bins_array correspond to the greatest value in each interval. Frequency returns an array and should be used with <Ctrl+Shift+Enter>. Prior to enter the function we need to select a range one row bigger than bins_array. This last element returns the count of any values above the highest interval. Suppose that we want to count how many scores fall in the intervals: [0â&#x20AC;Ś60%] [61%...70%] [71%...80%] We must create one bins_array with 3 rows containing (F2:F4): 60% 70% 80% To use Frequency we must select a column range with 4 rows (G2:G5) ,the last one represents how many values greater than 80% =FREQUENCY(D2:D11;F2:F4) <Ctrl+Shift+Enter>
Figure 10-1 Using Frequency
10-105
To calculate the number of distinct values we’re going to take advantage of one of the Frequency particularities. Frequency returns zero for the, eventual, repeated values in bins_array. Suppose we want to know how many distinct scores. We’re going to use the range, containing the scores, for both data_array and bins_array. As all values in data_array exist in bins_array, the first time a score appears in bins_array Frequency returns the frequency of that value, posterior appearances of the same value will return zero. Step 1 - Select F2:F12 and write: =FREQUENCY(D2:D11;D2:D11) <Ctrl+Shift+Enter>
Figure 10-2 Using scores as both data_array and bins_array
Note that the first time we have 70% Frequency returns two (we have two 70% scores), but for the second 70% Frequency returns zero. Step 2 – Replace all the numbers greater than zero by the value one. =--(FREQUENCY(D2:D11;D2:D11)>0) <Ctrl+Shift+Enter>
Figure 10-3 All values >0 replaced by 1
10-106
Step 3 – Use Sumproduct to sum the values in the array, this will be the number of distinct values. You can now delete the F2:F12 range and write in E2: =SUMPRODUCT(--(FREQUENCY(D2:D11;D2:D11)>0)) <Enter>
Figure 10-4 Using Frequency to obtain the number of distinct values in a numeric range
10.1.2
COUNTIF SOLUTION
Step 1 – Create an array that counts, for each D2:D11 value, the number of times it appears in D2:D11. That means how many values in D2:D11 = D2, D2:D11=D3…, D2:D11 =D11. Note that the result is similar to the Frequency one but, for repeat values, instead of zero we obtain the same previous value. The second time we have 70% we have again two instead of zero. =COUNTIF(D2:D11;D2:D11) <Ctrl+Shift+Enter>
Figure 10-5 Countif to obtain how many values in D2:D11 are equal to each D2:D11 value
10-107
Step 2 â&#x20AC;&#x201C; Divide 1 by the previous array That way a value occurring N times will show 1/N. But, as that value appears N times, we have, for the same value, N x 1/N, meaning that it will always sum one for the same value. =1/COUNTIF(D2:D11;D2:D11) <Ctrl+Shift+Enter>
Figure 10-6 The sum of this array will be the number of distinct values
Note that now for 70% we have, for the first occurrence, 0.5 and again 0.5 for the other occurrence (0.5+0.5 = 1). Step 3 â&#x20AC;&#x201C; Use Sumproduct to sum the values in the array, this will be the number of distinct values. You can now delete the F2:F11 range and write in E2: =SUMPRODUCT(1/COUNTIF(D2:D11;D2:D11)) <Enter>
Figure 10-7 Using Countif to obtain the number of distinct values in a numeric range
10-108
10.1.3
MATCH SOLUTION
Step 1 â&#x20AC;&#x201C; Create an array containing the row number corresponding to the first time we find each D2:D11 value inside D2:D11. Note that, each time we have the same score, we have the same match result. Write in F2:F11 =MATCH(D2:D11;D2:D11;0) <Ctrl+Shift+Enter>
Figure 10-8 Matching each D2:D11 value in D2:D11
Step 2 â&#x20AC;&#x201C; The first time a value occurs we want the value one, otherwise zero. That will happen if the row number is equal to the match result, then apply (--) to transform Boolean values into zeros and ones. The sum of this array will be the number of distinct values. =--(ROW(D2:D11)-ROW($D$2)+1=MATCH(D2:D11;D2:D11;0)) <Ctrl+Shift+Enter>
Figure 10-9 1 - First time a value occurs 0 - Otherwise
10-109
Step 3 – Use Sumproduct to sum the values in the array, this will be the number of distinct values. You can now delete the F2:F11 range and write in E2: =SUMPRODUCT(--(ROW(D2:D11)-ROW($D$2)+1=MATCH(D2:D11;D2:D11;0))) <Enter>
Figure 10-10 Using Match to obtain the number of distinct values in a numeric range
10.1.4
USING A TABLE AND STRUCTURED REFERENCES
Create a Table, name it “Student” and write in F2, the corresponding last E2 formula, now using Structured References: =SUMPRODUCT(--(ROW(Student[Score])ROW(Student[#Headers])=MATCH(Student[Score];Student[Score];0))) <Enter>
Figure 10-11 Number of distinct values with a Table and Structured References
Note that we add a new student with score 68% and the formula adapts to include it.
10-110
10.1.5
TIMING THE SOLUTIONS
The 3 solutions timed with a big array (A1:A2000) containing 492 distinct values.
Figure 10-12
The Frequency solution is the best for a range containing numeric values, it was about 100 times faster than Countif and about 4 times faster than Match. Countif is elegant but disappointing slow, should only be used with very short ranges.
10.2 NUMBER OF DISTINCT TEXT VALUES 10.2.1
FREQUENCY/MATCH SOLUTION
Frequency only works with data_array and bins_array containing numeric values, we need to adapt the previous solution. Suppose we want to know how many different C2:C11 classes. We can use an array containing the matches for each C2:C11 value in C2:C11 as data_array and an array containing all possible row numbers as bins_array. Step 1 – Create the Match array and row numbers array Write in F2:F11 =MATCH(C2:C11;C2:C11;0) <Ctrl+Shift+Enter> And in G2:G11 =ROW(C2:C11)-ROW($C$2)+1 <Ctrl+Shift+Enter>
Figure 10-13 F2:F11 – bins_array G2:G11 – data_array
10-111
Step 2 – Use Frequency with the formulas for data_array and bins_array We can now delete F2:F11 and G2:G11 ranges Write in F2:F12 =FREQUENCY(MATCH(C2:C11;C2:C11;0);ROW(C2:C11)-ROW($C$2)+1)
<Ctrl+Shift+Enter>
Figure 10-14 Result for Frequency
Notice that if we don’t have a match equal to the corresponding row number Frequency returns Zero. This method uses a similar logic to the Match one used in Figure 10-10 Step 3 – Use Sumproduct to count how many of these values are greater than zero. We can now delete F2:F12 range =SUMPRODUCT(--(FREQUENCY(MATCH(C2:C11;C2:C11;0);ROW(C2:C11)-ROW($C$2)+1)>0)) <Enter>
Figure 10-15 Frequency used to obtain the number distinct classes
10-112
10.2.2
COUNTIF SOLUTION
The solution is the same for numeric or non-numeric values Write in E2 =SUMPRODUCT(1/COUNTIF(C2:C11;C2:C11)) <Enter>
Figure 10-16 CountIf used to obtain the number distinct classes
10.2.3
MATCH SOLUTION
The solution for numeric values will still work for non-numeric. Write in E2 =SUMPRODUCT(--(ROW(C2:C11)-ROW($C$2)+1=MATCH(C2:C11;C2:C11;0)))
<Enter>
Figure 10-17 Match used to obtain the number of distinct classes
10-113
10.2.4
TIMING SOLUTIONS
As Frequency/Match and Match use a similar logic they are expected to have a similar performance. The 3 solutions timed with a big text array (C1:C2000) containing about 100 distinct values.
Figure 10-18
Values for Frequency/Match and Match are very similar but, in my tests, Match alone was always slightly faster and, in my opinion, easier to read/understand. The 3 solutions timed with a big text array (B1:B2000) containing about 1900 distinct values.
Figure 10-19
Note that even the number of distinct values will influence, in a different way for each formula, the performance. Many distinct values and Countif performance grows. For solutions using Match, performance significantly drops when the number of distinct values grows (10 times slower from 100 distinct values to 1900 distinct values).
10-114
10.3 OBTAINING THE DISTINCT VALUES We are going to use different methods in order to obtain in G2:G6 the C2:C11 distinct classes
10.3.1
MANUAL METHOD – USING REMOVE DUPLICATES
Step 1 –Remove Duplicates always remove the duplicates in place, we need to copy the data to the destination. Copy C1:C11 values to G1:G11.
Figure 10-20
Step 2 – Select G1:G11, go to Data – Remove Duplicates
Figure 10-21
10-115
Step 3 â&#x20AC;&#x201C; Make sure My Data has headers is ticked Press <OK>
Figure 10-22
Step 4 â&#x20AC;&#x201C; Press <OK> G2:G4 now contains the C1:C11 distinct values
Figure 10-23 Using Remove Duplicates
10-116
10.3.2
MANUAL METHOD – USING ADVANCED FILTER
Step 1 – Select C1:C11, go to Data Advanced Filter
Figure 10-24
Step 2 – Choose Copy to another location List range: $C$1:$C$11 Criteria range: <empty> Copy to: G1 Tick Unique records only (repetitions will not be copied) Press <OK>, G2:G4 now contains the C1:C11 distinct values
Figure 10-25 Using Advanced Filter
10-117
10.3.3
MATCH WITH AGGREGATE FORMULA SOLUTION (STANDARD REFERENCES)
Step 1 – Create an array containing the row number corresponding to the first time we find each C2:C11 value inside C2:C11. Note that, each time we have the same score, we have the same match result. This is the same method used back in Figure 10-8. Write in F2:F11 =MATCH($C$2:$C$11;$C$2:$C$11;0) <Ctrl+Shift+Enter>
Figure 10-26 Matching each C2:C11 value in C2:C11
Step 2 – The first time a value occurs we want the value one, otherwise zero. That will happen if the row number is equal to the match result, then apply (--) to transform Boolean values into zeros and ones. The sum of this array will be the number of distinct values. =--(ROW($C$2:$C$11)-ROW($C$2)+1=MATCH($C$2:$C$11;$C$2:$C$11;0)) <Ctrl+Shift+Enter>
Figure 10-27 1 - First time a value occurs 0 – Otherwise
10-118
Step 3 – Divide row number by the previous array Doing that the first time the class occurs we have the row number, error otherwise =(ROW($C$2:$C$11)-ROW($C$2)+1)/(ROW($C$2:$C$11)ROW($C$2)+1=MATCH($C$2:$C$11;$C$2:$C$11;0)) <Ctrl+Shift+Enter>
Figure 10-28 Row number - First time a value occurs Error - Otherwise
Step 4 – Copy the formula text from last array and use it as third Aggregate argument [function_num=15 (Small), Options=6 (ignore errors)]. Use then Index to obtain the corresponding class. We can now delete the F2:F11 range Write in G2: (to obtain first class) =C2 and then in G3: (copy down till G6) =IF(G2="";"";IFERROR(INDEX($C$2:$C$11;AGGREGATE(15;6;(ROW($C$2:$C$11)ROW($C$2)+1)/ (ROW($C$2:$C$11)-ROW($C$2)+1=MATCH($C$2:$C$11;$C$2:$C$11;0));ROWS($G$2:G3)));"")) <Enter>
Figure 10-29 Distinct C2:C11 Classes
10-119
10.3.4
MATCH WITH AGGREGATE FORMULA SOLUTION (STRUCTURED REFERENCES)
Create a Table, name it “Student” and write in G2: =INDEX(Student[Class];1) <Enter> and in G3, the corresponding last G3 formula, now using Structured References: (copy down till G6) =IF(G2="";"";IFERROR(INDEX(Student[Class];AGGREGATE(15;6;(ROW(Student[Class])ROW(Student[#Headers]))/(ROW(Student[Class])ROW(Student[#Headers])=MATCH(Student[Class]; Student[Class];0));ROWS($G$2:G3)));"")) <Enter>
Figure 10-30 Distinct C2:C11 Classes using a Table and Structured References
Notice that we add a new student in class T4 and the formula adapts to include it.
10.3.5
USING A PIVOT TABLE
We’re going to use a Pivot Table to obtain the distinct classes and the corresponding score average Step 1 - Create a Table using the A1:D11 data and name it “Student” Step 2 – Select the table, go to Insert – Pivot Table
Figure 10-31 Creating the Pivot Table
10-120
Choose: Existing Worksheet, Location: G1, Press <OK>
Figure 10-32 Pivot Table Fields
Select Class and Score or Drag Class to Rows and Score to ∑ Values Step 3 – Change Pivot Table name (Analyze – Pivot Table Name: StuPivot)
Figure 10-33 Changing Pivot Table Name
10-121
Step 4 – Right Click one of the scores, choose Value Field Settings
Figure 10-34 Changing Field Settings
Select Average, Custom Name “Score Av” Press <Number Format>, choose Percentage, Press <OK> Step 5 – Remove Grand Total, Go to Design – Grand Totals – On for Rows Only or Analyze – Options – Total & Filters Remove tick from: Show grand totals for columns
Figure 10-35 Removing Grand Total
10-122
Your Pivot should now look like this:
Figure 10-36 Pivot Table final look
If Student Table data changes, we need to manually Refresh the Pivot in order to reflect changes. We’re going to use a little VBA to solve this problem. Step 6 – Use the On Change event corresponding to the sheet where the Pivot Table is located (In my case Sheet1) – Developer – Visual Basic, select the correct Sheet, First Drop Down – WorkSheet, Second Drop Down – Change
Figure 10-37 VBA WorkSheet Change Event
The headers for the Worksheet Change Event will be automatically created. Now write between Private Sub and End Sub: VBA Worksheet Change Event PRIVATE SUB WORKSHEET_CHANGE(BYVAL TARGET AS RANGE) ' ONLY UPDATE IF STUDENT CLASS OR SCORE CHANGES IF NOT (APPLICATION.INTERSECT(RANGE("STUDENT[[CLASS]:[SCORE]]"), TARGET) IS NOTHING) THEN 'UPDATE PIVOT ME.PIVOTTABLES("STUPIVOT").PIVOTCACHE.REFRESH ' ORDER BY CLASS ME.PIVOTTABLES("STUPIVOT").PIVOTFIELDS("CLASS").AUTOSORT XLASCENDING, "CLASS" END IF END SUB
10-123
Now, each time Class or Score changes in Student Table, the StuPivot Pivot Table will refresh and Class will be ordered.
Figure 10-38 Using WorkSheet Change Event Pivot Table automatically refreshes
10-124
11 CONDITIONAL FORMATTING Conditional Formatting is one of my favorite Excel features. Conditional Formatting, opposing to normal formatting, is dynamic, we can use rules to say when some formats should be applied.
11.1 CREDIT CARD VALIDATION We start with the Credit card validation example. We have a list of orders in a table called Torder, we want lines with wrong credit card numbers formatted with a red background.
Figure 11-1 Torder Table
Step 1 - Select the data part of the table (A2:C5) Step 2 â&#x20AC;&#x201C; Go to Home-Conditional Formatting- New rule Select Use a formula to determine which cells to format
Figure 11-2 New Rule
11-125
The formula must return True or False, If the formula evaluates to True Excel will apply the chosen format. You should write the formula based on the first selected cell, in this case A2. Excel will, behind the curtains, copy the formula to all other selected cells. Remember that, in this case, we want all the row 1:3 to be formatted. A2 should be formatted when we have a wrong credit card in B2. The formula should check the value in B2 and, as the formula must be copied right and down, the reference to B2 should be $B2. Step 3 - Write the formula: (This is the formula from chapter 6.9 – Credit Card Validation, adapted to B2 and changing =0 to <>0) =MOD(SUMPRODUCT(MID(REPT("0";16LEN($B2))&$B2;ROW($1:$16);1)+MOD(ROW($1:$16);2)*(MID(REPT("0";16LEN($B2))&$B2;ROW($1:$16);1)-(2*MID(REPT("0";16LEN($B2))&$B2;ROW($1:$16);1)>9)*9));10)<>0 Press <Format> Fill – choose a red color Press <OK> then <Apply>
Figure 11-3 Conditional Formatting showing a wrong card number for order 3
Notes: Conditional Formatting will not accept direct Structured References. Conditional Formatting does not need <Ctrl+Shift+Enter> When using a big formula, it is best to write and test it in the worksheet and then copy the formula text into Conditional Formatting.
11.2 CHANGING ACTIVE ROW BACKGROUND When entering or visualizing data its, sometimes, very useful to change the background for the Active Row. Let us see how we can achieve this. In Excel we can use the formula =CELL(“row”) to obtain the active row number, =ROW() to obtain the row number of the cell which contains the formula.
Figure 11-4 Obtaining Active Row
11-126
But, obviously, if the selected cell changes the formula will not update unless the worksheet recalculates.
Figure 11-5
Pressing <F9> and the formula updates
Figure 11-6 Pressing <F9> key to force recalculation
Step 1 â&#x20AC;&#x201C; Create a table with the student data, name it Student
Figure 11-7 The Student Table
11-127
Step 2 – Select the data part of the table (A2:D11) Go to Home – Conditional Formatting – New Rule Select Use a formula to determine which cells to format To check if the row corresponds to the active row write: =ROW($A2)=CELL(“row”) or CELL(“row”) =ROW()
Figure 11-8 New Formatting Rule
Press <Format> - Fill – choose an orange background Press <OK> then <Apply>
Notice that row 2 changed to orange. Press <OK>
Figure 11-9 Apply the Rule
But, if we choose another row, the Conditional Formatting formula will not update unless we press <F9>
11-128
Figure 11-10 Selecting another row will not force the formula to update
Step 3 â&#x20AC;&#x201C; Use the Worsheet Selection Change Event to force recalculation In Developer â&#x20AC;&#x201C; Visual Basic, Double Click the corresponding sheet, select the Selection Change Event and write:
Figure 11-11 Worksheet Selection Change Event
Now, when we change the active row inside Student Table, the formula updates and the active row will change color.
Figure 11-12 Changing row and Background changes automatically
11-129
And adapts automatically as table grows.
Figure 11-13 Adapting when new rows enter Student Table
11.3 STUDENT NAMES IN BOLD FOR THE BEST SCORE IN EACH CLASS Step 1 â&#x20AC;&#x201C; Create an array with score values for those belonging to C2 class (class values start at C2) Reference to C2 must be relative to adapt when used in Conditional Formatting and references do $D$2:$D$11 and $C$2:$C$11 must be absolute. Write in G2:G11 =$D$2:$D$11*($C$2:$C$11=C2)
<Ctrl+Shift+Enter>
Figure 11-14 Scores for students belonging to C2 class
This formula, when used in Conditional Formatting, will adapt (C2 â&#x20AC;&#x201C; relative reference) to all other C3:C11 values, creating a different array with the corresponding scores for the C2:C11 classes.
11-130
Note that, for classes different from the one in C2, the value is 0. This can be enough because we want the maximum value in each class, but will fail if, for instance, we have a class only with zeros or want to minimize instead. To be on the safe side we are going to replace the ones from different classes with “” (empty string) Step 2 – Replace the G2:G11 formula by: =IF($C$2:$C$11=C2;$D$2:$D$11;"") <Ctrl+Shift+Enter>
Figure 11-15 Classes different from C2 replaced by <empty string>
Step 3 – In Conditional Formatting we want to obtain the maximum from this array and format the corresponding student name if the score is equal to this maximum. Select B2:B11 Go to Home – Conditional Formatting – New Rule Select Use a formula to determine which cells to format Write the formula:=D2=MAX(IF($C$2:$C$11=C2;$D$2:$D$11;""))
Figure 11-16 Conditional Formatting – Student names in bold for the best score in each class
11-131
12 DATA VALIDATION With Data Validation we can create rules to control what a user can in a cell.
12.1 CREDIT CARD VALIDATION We start with the Credit card validation example. We have a list of orders in a table called Torder, we want to accept only valid Credit Card numbers in column B.
Figure 12-1 Torder Table
Step 1 - Select Credit Card column data (B2:B4) Step 2 - Go to Data – Data Validation - Settings Choose Custom
Figure 12-2 Data Validation
The formula must return True or False, If the formula evaluates to True Excel will accept the value, otherwise the value will not be accepted. You should write the formula based on the first selected cell, in this case B2. Excel will, behind the curtains, copy the formula to all other selected cells. Step 3 - Write the formula: (This is the formula from chapter 6.9 – Credit Card Validation, adapted to B2) =MOD(SUMPRODUCT(MID(REPT("0";16LEN(B2))&B2;ROW($1:$16);1)+MOD(ROW($1:$16);2)*(MID(REPT("0";16LEN(B2))&B2;ROW($1:$16);1)-(2*MID(REPT("0";16-LEN(B2))&B2;ROW($1:$16);1)>9)*9));10)=0
12-132
Step 4 – Go, if you want, to Error Alert to write your personalized Error Message
Figure 12-3 Data Validation – Error Alert
Step 5 – Testing Enter a new order (4) with a wrong Card number.
Figure 12-4 Testing Data Validation
Notes: Data Validation will not accept direct Structured References. Data Validation formulas don’t need <Ctrl+Shift+Enter> When using a big formula, it is best to write and test it in the worksheet and then copy the formula text into Data Validation formula.
12-133
12.2 MAKING SURE WE DON’T ENTER REPEATED ORDER NUMBERS Starting with Torder table, we want to make sure we don’t enter a repeated order number.
Figure 12-5 Torder Table
Step 1 - Select Order column data (A2:A4) Step 2 - Go to Data – Data Validation - Settings Choose Custom Enter the formula: =COUNTIF(INDIRECT("Torder[Order]");A2)=1
Figure 12-6 Data Validation - Custom
Remember that we cannot use direct Structured References in Data Validation, so we use Indirect to refer to Torder Order column. This way it will work when we enter a new order (new Torder row) Step 3 – Testing Enter a new order (4) with an already user order number.
Figure 12-7 Testing Data Validation
12-134
12.3 CREATING DROP DOWN LISTS Creating Drop Down Lists is one of the most Data Validation common uses
12.3.1
A LIST OF COUNTRIES – NOT ADAPTABLE
Figure 12-8 Data Validation – simple Drop Down example
We want a drop down in B3 containing the countries in E3:E5 range. Step 1 – Select B3 cell Step 2 – Go to Data – Data Validation, Choose List, write in Source: =$E$3:$E$5
Figure 12-9 Data Validation – Using values from a List
Press OK
Figure 12-10 Data Validation – Creating a Drop Down Box
B3 cell contains now a Drop Down Box where one of the countries can be selected. If we add a new country in E6 the Drop Down will not adapt.
12-135
12.3.2
A LIST OF COUNTRIES – ADAPTABLE (USING A FORMULA)
In order for the source range to adapt we need a formula that returns an adaptable range. We can use Offset and the Counta solution shown in 4.1.1 Counta will be used as 4th Offset argument to define the number of rows in the new range. The range used with Counta should be big enough to include all the possible new countries. In the example we’re going to use E3:E12. Step 1 – Select B3 cell Step 2 – Go to Data – Data Validation, Choose List, write in Source: =OFFSET(E3;0;0;COUNTA(E3:E12);1)
Figure 12-11 Data Validation – Using a formula to return the source range
Now the Drop Down in B3 will adapt when we include a new country.
Figure 12-12 Data Validation – Drop Down adapts to include the new country in E6
12-136
12.3.3
A LIST OF COUNTRIES – ADAPTABLE (USING TABLES)
Transforming the countries range into a table is the easier and most adaptable solution. Step 1 – Select E2:E5 and transform it into a Table, name the table Countries.
Figure 12-13 Countries table
Step 2 – Select B3, go to Data – Data Validation, Choose List, Write in Source: =INDIRECT("Countries") (remember that we cannot use direct Structured References in Data Validation)
Figure 12-14 Using a Table as Source
With introduction of new data, the Table adapts, and the Drop Down adapts accordingly
Figure 12-15 Data Validation – Drop Down adapts as Countries Table grows
12-137
12.3.4
CITY DROP DOWN CONTENT DEPENDING ON SELECTED COUNTRY IN COUNTRY DROP DOWN
That means we have two Drop Downs one depending on the other.
Figure 12-16 Data to fill the B3 and C3 Drop Down
In B3 we want a Drop Down to select the Country (based on E3:E5 values) In C3 we want a Drop Down to select a city corresponding to the chosen country in B3. G3:G5 contains the cities of England, I3:I5 the cities of France and K3:K6 the cities of Portugal. Step 1 Select E2:E5, create a table named Countries (as we did in 12.3.3) Select G2:G5, create a table named England Select I2:I5, create a table named France Select K2:K6, create a table named Portugal Note the name used for each country table should match the name used in E3:E5
Figure 12-17 Creating the 4 Tables
Step 2 - Select B3, go to Data â&#x20AC;&#x201C; Data Validation, Choose List, Write in Source: =INDIRECT("Countries")
12-138
Step 3 - Select B3, go to Data â&#x20AC;&#x201C; Data Validation, Choose List, Write in Source: =INDIRECT(B3)
When we change the country in B3, the Drop Down in C3 adapts, to show the corresponding cities.
As we use tables, the Drop Downs update automatically, when a new city is introduced.
12-139
13 DATES AND ARRAY FORMULAS A Date is just an integer number corresponding to the number of days since a reference date (190001-01). Try this: Write the value 5 in cell A1.
Now format cell A1 as Date, you will obtain:
13.1 ARRAY FORMULA TO OBTAIN ALL DATES BETWEEN TWO GIVEN DATES Starting date in A1, ending date in A2 Step 1 â&#x20AC;&#x201C; select a range with enough cells to contain all dates from A1 to A2. For the example select C1:C15 Write: =ROW(INDIRECT(A1&":"&A2)) <Ctrl+Shift+Enter>
Figure 13-1 Date serial numbers corresponding to all dates between A1 and A2 dates
13-140
Step 2 – Format, if you want, C1:C15 as date
Figure 13-2 All dates between A1 and A2 dates
13.2 HOW MANY FRIDAYS THE 13TH BETWEEN TWO GIVEN DATES Starting date in A1, ending date in A2 Step 1 – Create the array formula to obtain all dates between A1 and A2 To try it we don’t need a range with all necessary size, just obtain the first 15 or so. Remember, in the end, we don’t need this range. Select C1:C15 and write: =ROW(INDIRECT(A1&":"&A2)) <Ctrl+Shift+Enter>
Figure 13-3 Date serial numbers corresponding to the first 15 A1 to A2 dates
13-141
Step 2 – Check, for all dates, if it is a Friday the 13th( (1 – date corresponds to a Friday the 13th, 0 – not a Friday the 13th). We are going to use the Weekday and Day functions. Weekday will return the day of the week (a number between 1-7 or 0-6). If we omit the last argument the default is 1, meaning the first day of the week will be Sunday (value 1) and Friday will correspond to value 6. Write in C1:C15: =(WEEKDAY(ROW(INDIRECT(A1&":"&A2)))=6)*(DAY(ROW(INDIRECT(A1&":"&A2)))=13) <Ctrl+Shift+Enter>
Figure 13-4 1 – Date corresponds to a Friday the 13th 0 – Not a Friday the 13th
Step 3 – The sum of all these zeros and ones will be the number of Friday the 13th between A1 and A2 dates. Copy the text from the previous formula and use it with the Sumproduct function. The C1:C15 range can now be deleted. Write in B1: =SUMPRODUCT((WEEKDAY(ROW(INDIRECT(A1&":"&A2)))=6)*(DAY(ROW(INDIRECT(A1&":"& A2)))=13))
Figure 13-5 In B1, number of Friday the 13th from A1(2006-01-01) till A2(2017-12-31)
13-142
13.3 HOW MANY MONTHS WITH 5 COMPLETE WEEKENDS BETWEEN TWO GIVEN DATES? To use complete months, starting date(A1) will correspond to the beginning of a month, and ending date(A2) will correspond to the end of a month. The general idea is to find out how many months contain 5 Saturdays + 5 Sundays, 10 weekend days. If a month contains 10 of these days, it will contain 5 complete weekends. Starting a month with a Sunday, ending with a Saturday, and summing 10 is impossible for months with 28, 29, 30 or 31 days. In any given month, if there are 5 Saturdays and 5 Sundays, we have 5 complete weekends. Note that neither is possible to obtain a value greater than 10. Step 1 – Create an array to check, for all days between A1 and A2, if it is a Saturday or Sunday. We’re going to use the Weekday function, second argument (2), week starts Monday, any value greater than 5 will be a weekend day. We’re going to use range D1:D25 to, partially, obtain the result. Select D1:D25 and write the array formula: =WEEKDAY(ROW(INDIRECT(A1&":"&A2));2)>5 <Ctrl+Shift+Enter>
Figure 13-6 TRUE – Weekend day, FALSE otherwise
13-143
Step 2 – Replace FALSE by zero and TRUE by a numeric value that uniquely identifies the year and month. Being d [ROW(INDIRECT(A1&":"&A2))] any date from A1 to A2, the value will be calculated as: 12*(Year(d)-Year(A1)) + Month(d) This will return 1 to 12 for the first year, 13 to 24 for the second, 25 to 36 for the third, etc. Replace the D1:D25 formula by: =(WEEKDAY(ROW(INDIRECT(A1&":"&A2));2)>5)*(12*(YEAR(ROW(INDIRECT(A1&":"&A2) ))-YEAR(A1))+MONTH(ROW(INDIRECT(A1&":"&A2)))) <Ctrl+Shift+Enter>
Figure 13-7 <> 0 – Saturday or Sunday 0 – not a weekend day
Note that this is only part of the total array. In the same year/month the <> 0 value will be always the same (obtained by M = 12*(Year(d)-Year(A1)) + Month(d)The formula produces a unique value for any Year and Month combination. The idea is to use the Frequency function to count how many of those values exist in each month. The previous formula will be our data array to be used with Frequency. The bin array should contain values from 0 to all possible month values (M) In the example A1 = 2016-07-01 and A2 = 2017-12-31, we have a value of M for two different years (2016 and 2017) so bins array should contain values from 0 to 24. If Year(A2)=2018 bins array should be 0 to 36. The upper limit can be calculated by the formula: 12*(1+(Year(A2)Year(A1))
13-144
To obtain (for the example) an array containing values 1 to 24 we can use the formula: =ROW(INDIRECT("1:" & 12*(1+(YEAR(A2)-YEAR(A1))))) But, as we want a value 0 to 24, the formula should be: =ROW(INDIRECT("1:" & 1+12*(1+(YEAR(A2)-YEAR(A1)))))-1 Step 3 â&#x20AC;&#x201C; Obtain bins array in F1:F25 Select F1:F25 and write: =ROW(INDIRECT("1:" & 1+12*(1+(YEAR(A2)-YEAR(A1)))))-1 <Ctrl+Shift+Enter>
Figure 13-8 Bins Array
13-145
Step 4 â&#x20AC;&#x201C; Using Frequency in H1:H26 to obtain how many Weekend days in each month. Select H1:H26 and write: =FREQUENCY((WEEKDAY(ROW(INDIRECT(A1&":"&A2));2)>5)*(12*(YEAR(ROW(INDIRECT(A1& ":"&A2)))-YEAR(A1))+MONTH(ROW(INDIRECT(A1&":"&A2))));ROW(INDIRECT("1:" & 1+12*(1+(YEAR(A2)-YEAR(A1)))))-1) <Ctrl+Shift+Enter>
Figure 13-9 Frequency to obtain the number of Weekend days in each month
Note: The first value (391) is the number of zeros in data array, corresponds to the total number of weekdays between A1 and A2 dates. The other values, except the last, correspond to the number of Weekend days in the corresponding month. A value of 10 means we have 5 complete Weekends in the corresponding month.
13-146
Step 5 â&#x20AC;&#x201C; To know the number of months with 5 Weekends, we need to count how many values, in the previous array, are equal to 10. Copy the last text formula and use it with Sumproduct (we can now delete the 3 auxiliary ranges) write in C1: =SUMPRODUCT(--(FREQUENCY((WEEKDAY(ROW(INDIRECT(A1&":"&A2));2)>5) *(12*(YEAR(ROW(INDIRECT(A1&":"&A2)))-YEAR(A1))+MONTH(ROW(INDIRECT(A1&":"&A2)))); ROW(INDIRECT("1:" & 1+12*(1+(YEAR(A2)-YEAR(A1)))))-1)=10))
Figure 13-10 Number of months with 5 complete Weekends
13.4 HOW MANY MONTHS WITH 5 COMPLETE FRIDAY, SATURDAY, SUNDAY SEQUENCES The formula can easily be adapted to count how many months with 5 complete sequences of Fridays, Saturdays, and Sundays. Note that can only happen in months with 31 days when the beginning of the month is a Friday. In the previous formula replace >5 by >4 (to include Friday) and =10 by =15. The complete formula will be: =SUMPRODUCT(--(FREQUENCY((WEEKDAY(ROW(INDIRECT(A1&":"&A2));2)>4) *(12*(YEAR(ROW(INDIRECT(A1&":"&A2)))-YEAR(A1))+MONTH(ROW(INDIRECT(A1&":"&A2)))); ROW(INDIRECT("1:" & 1+12*(1+(YEAR(A2)-YEAR(A1)))))-1)=15))
Figure 13-11 Number of months with 5 complete Friday, Saturday, Sunday sequence
13-147
14 A MODEL FOR MULTIPLE CHOICE TESTS In this Chapter, weâ&#x20AC;&#x2122;re going to create a Model to obtain the Scores for Multiple Choice Tests.
14.1 WITH NO PENALTY FOR WRONG ANSWERS Figure 14-1 will show the data for the model.
Figure 14-1 Data for the Multiple-Choice Test Model
C2:M5 contains the solutions for the 3 different type of exams (1, 2 or 3) C8:M17 contains the student answers, C8:C17 tells if student did type 1, 2 or 3. An empty cell means the student did not answer that question. We want the student score (%) in N8:N17 Not answering or a wrong answer will be, in this first case, treated as equal.
Step 1 â&#x20AC;&#x201C; Compare the first student answers with Type 1 solution. Use a row with a number of columns equal to the number of questions, for instance P3:Y3 We want 1 for a correct answer, 0 for a wrong answer.
14-148
Select P3:Y3 and write: =--(D8:M8=$D$3:$M$3) <Ctrl+Shift+Enter>
Figure 14-2 Compare First student answers with Type 1 solutions
Step 2 â&#x20AC;&#x201C; If the first student did other than Type 1 the previous results are incorrect. We should compare the answers with all possible solutions (Type 1,2 or 3) For that we need a range with number of rows equal to the number of Solution Types, for instance P3:Y5 Select P3:Y5 and write: =--(D8:M8=$D$3:$M$5) <Ctrl+Shift+Enter>
Figure 14-3 Compare First student answers with Type 1,2 and 3 solutions
Note that P3:Y3 shows the results of comparing D8:M8 with D3:M3, P4:Y4 shows the results of comparing D8:M8 with D4:M4 and P5:Y5 shows the results of comparing D8:M8 with D5:M5
14-149
Step 3 â&#x20AC;&#x201C; Make sure that the rows corresponding to the wrong type are ignored (they should show only zeros) Compare the student type (C8) with all possible types (C3:C5). This will result in a range with 3 rows and one column, only one of the rows (the one corresponding to the correct type) will contain the value 1, the others will contain zero. Multiply that by the previous formula. Select P3:Y5 and write: =(C8=$C$3:$C$5)*(D8:M8=$D$3:$M$5) <Ctrl+Shift+Enter>
Figure 14-4 Ignoring rows corresponding to the wrong Type
Note that as the first student did Type 1, rows 2 and 3 contain only zeros. The number of correct answers will be the sum of the previous array. To obtain the result in percentage we only need to divide by the number of questions. Step 4 â&#x20AC;&#x201C; Copy the previous text formula and use it with Sumproduct Write in N8 and then Copy Down till N17: (you can now delete P3:Y5 range) =SUMPRODUCT((C8=$C$3:$C$5)*(D8:M8=$D$3:$M$5))/COLUMNS($D$2:$M$2)
Figure 14-5 Scores for all students
14-150
14.2 WITH PENALTY FOR WRONG ANSWERS In this case we have a B8 value (%) penalty for a wrong answer. Not answering or a wrong answer will be, in this first case, treated different. Figure 14-6 shows the data fro the new model
Figure 14-6 Data for the Multiple-Choice Test Model with penalty for wrong answers
Step 1 â&#x20AC;&#x201C; Use the previous (no penalty method) and subtract the penalty Write in P3:Y5 =(C8=$C$3:$C$5)*((D8:M8=$D$3:$M$5)-$B$2*(D8:M8<>$D$3:$M$5)) <Ctrl+Shift+Enter>
Figure 14-7 Penalty for Wrong or empty answer
Note that I8<>I3 (U3 value -0.25), K8 is empty we have to remove this (W3) penalty
14-151
Step 2 – Apply penalty only if the answer is wrong but not empty So we will multiply the penalty expression by (D8:M8<>””) Replace P3:Y5 formula by: =(C8=$C$3:$C$5)*((D8:M8=$D$3:$M$5)-$B$2*(D8:M8<>"")*(D8:M8<>$D$3:$M$5)) <Ctrl+Shift+Enter>
Figure 14-8 Distinguish wrong answers from empty answers
Note that now U3 contains -0.25 (wrong answer) and W3 contains 0 (Empty answer) Step 3 – Copy the previous text formula and use it with Sumproduct Write in N8 and then Copy Down till N17: (you can now delete P3:Y5 range) =SUMPRODUCT((C8=$C$3:$C$5)*((D8:M8=$D$3:$M$5)$B$2*(D8:M8<>"")*(D8:M8<>$D$3:$M$5)))/COLUMNS($D$2:$M$2)
Figure 14-9 Scores for all students with penalty for wrong answers
14-152
14.2.1
A SHORTER MORE EFFICIENT SOLUTION
Instead of checking for wrong answers, weâ&#x20AC;&#x2122;re going to add B2 (penalty) value to correct answers and subtract the same value to all no empty answers. Our P3:Y5 formula should now be: =(C8=$C$3:$C$5)*(D8:M8<>"")*((D8:M8=$D$3:$M$5)*(1+$B$2) -$B$2) <Ctrl+Shift+Enter>
Figure 14-10 Distinguish wrong answers from empty answers Shorter/more efficient solution
And In N8, to obtain final scores: =SUMPRODUCT((C8=$C$3:$C$5)*(D8:M8<>"")*((D8:M8=$D$3:$M$5)*(1+$B$2)$B$2))/COLUMNS($D$2:$M$2) < Enter>
Figure 14-11 Scores for all students with penalty for wrong answers Shorter/more efficient solution
14-153
15 EXCEL 365 With an Excel 365 subscription you can access some new function that will make things much easier.
15.1 FILTER FUNCTION Many of the presented problems can now be solved with Filter Filter can be used to filter a range of data based on a specific criteria. Filter belongs to the category of Dynamic Arrays functions. Filter returns an array of values that automatically spills into a range of cells, starting from the cell where you enter the formula. Using filter to solve 9.2 problem (we want to obtain at G2, and down, the student names belonging to G1 class). We want B2:B11 values that obey C2:C11=G1 The formula at G2 will be: =FILTER(B2:B11;C2:C11=G1)
Figure 15-1 Using Filter
15.2 UNIQUE FUNCTION Unique returns an array containing the unique values from a range or array. With this function will be much easier to obtain the list of unique values and how many they are. Distinct classes at E2: =UNIQUE(C2:C11) If we have empty cells at C2:C11 we should use instead: =UNIQUE(FILTER(C2:C11;C2:C11<>"")) How many Distinct Classes at G2: =COUNTA(E2#) or, if we donâ&#x20AC;&#x2122;t have the unique values already : =COUNTA(UNIQUE(C2:C11))
Figure 15-2 Using Unique
Note the way we refer to the results of a dynamic formula (E2#)
15-154
15.3 SORT FUNCTION Sort can be used to sort a range or array The syntax is:
=SORT (array, [sort_index], [sort_order], [by_col]) Arguments:
array - Range or array to sort. sort_index - [optional] Column index to use for sorting. Default is 1. sort_order - [optional] 1 = Ascending, -1 = Descending. Default is ascending order. by_col - [optional] TRUE = sort by column. FALSE = sort by row. Default is FALSE. Problem: Obtaining the Top 5 Step 1 – Use Filter to obtain Name and Score for all Students with a score >= fifth large score Formula at E2: =FILTER(B2:C11;C2:C11>=LARGE(C2:C11;5))
Figure 15-3 Using Filter
Step 2 – Use Sort to sort by score (column 2) descending (-1) =SORT(FILTER(B2:C11;C2:C11>=LARGE(C2:C11;5));2;-1)
Figure 15-4 Using Sort
For those without Excel 365, you can find 3 similar UDF VBA functions in the TimeIt file (TFilter, TUnique and TSort) In the above examples you only need to replace Filter by TFilter, Unique by TUnique and Sort by Tsort.
15-155