Mobile Menu AB Tested: Hamburger Not the Best Choice?

UPDATE: A second larger test was undertaken after this test, read about it here. (Hint: hamburger icon still didn’t work so well).

Recently I’ve been reading books about quantum physics (clearly I’ve got too much time on my hands).

I don’t profess to understand even a tenth of what I’ve read, but it’s fascinating (even if my friends shake their heads at the new levels of geekdom I’ve aspired to).

The subatomic world seems to exist in a mystical state of uncertain probabilities – UNTIL it’s measured and observed.

Exactly like user experience and usability.

We really have no idea at all how users interact with our websites.

Until we measure and observe, our assumptions are based on uncertain probabilities.

Previous Hamburger Icon Test

I did a previous test that seemed to show that a bordered “hamburger” (aka sandwich) icon was used more than other options.

hamburgertest1

The menu icon on the right was clicked more than the previous two.

I then decided to test the hamburger icon against the word “MENU”.

A/B Test Conditions

The test was run against all mobile browsers across all pages.

The duration was about 5 days, and served to around 50,000 mobile visitors.

Demographics of the site show a skewing towards the 25-34 age group.

Demographics

Demographics

Mobile visitors are split into the following:

  • iOS 64%
  • Android 34%
  • Windows Phone and Blackberry make up most of the remaining 2%.

Original (baseline)

Based on the results of my previous test, the site now has a bordered ‘hamburger’.

baseline

Variation 1 (MENU + Border)

2-variation1

Variation 2 (MENU + Hamburger + Border)

2-variation2

Variation 3 (MENU without Border)

I would never consider this implementation, but I wanted to test, and check my assumptions.

2-variation3

Results

hamburgertest2

As predicted the word “MENU” alone performed poorly (but not as badly as I presumed).

Of interest is the bordered menu was clicked on significantly more than our hamburger icon.

iOS vs Android

Another tracking metric I’ve setup is event tracking in Analytics. I record an event every time the mobile menu is clicked.

What the preliminary data is showing:

iOS users are 2-3 times more likely to tap a menu icon than Android users.

Interpretations

Based on this and my previous AB test, a flat hamburger icon may not be ideal on a responsive website (remember this is a website not an app). Using the word MENU (and making it look like a button) could be more helpful for visitors.

This does not mean that users do not understand the hamburger/sandwich – it could be that the word MENU draws more attention.

UPDATE: New research from Nielsen Norman Group shows: Users are very familiar with the magnifying glass icon for search, but “Users are still unfamiliar with newer icons, including the three-line menu icon and the map-marker icon”.

UPDATE: A larger test (250k visits) was undertaken after this. Read about it here.

Further discussion: EricMobley.net

James Foster
James makes websites, you can follow him on @exisweb.
Filed under AB Tests Responsive Design
Updated: April 8, 2014

187 Comments

  1. Hi James,

    Very interesting results. What do you use to run your A/B testing?

    • I’ve been using Optimizely (only because I discovered I had a sizable credit sitting there).

      In the future I’m going with a more roll-your-own approach. When I get it working I’ll write up how I did it.

  2. Our site has the word “menu” + the hamburger, but without the border. I am curious what that combination would yield.

    • I wonder if that could be confusing for users. Are they two different functions – a menu or something else? Or is it a single function?

  3. Really really interesting. I was talking about this subject on Facebook with some other designers a few days ago and was really sure that the burger was well established from now. I’m still sure that people perfectly know what it is about BUT the word menu (with the border which makes it explicitly a button) got bigger proportions, so it is easily clickable for mobile users (especially with big fingers). I totally agree with your conclusion.

    • I think the burger is established. But it’s not the last word in terms of indicating menu-type functionality.

      I personally have got frustrated with apps that have very small icons that are hard to tap.

  4. I agree with everything! Including the micro science, or ‘quantum physics’, which is a silly name.

    I’m not surprised about the results, but it’s good that there’s now proof. Statistics and lies, right?

    I guess I should start following this blog.

    • I was about to start writing about where the word quanta came from (I believe it was Max Planck), but I’m way off topic.

      Thanks for your comments. Take the results with a grain of salt, as there are always so many variables, but it’s good to test.

  5. Hey James,

    Do you have any idea how response time / network latency plays into the numbers and if they line up relatively evenly across the 4 variants? (just to keep the suspicion from your last test sample size going, though understandable w/ that pricing)

    cheers

    • Good question Steven. That kind of data is not available, but there is no reason why there would be any difference across variants.

      This particular site that I tested on has been something that I’ve spent a lot of time on performance. Regular monitoring with pingdom gives a consistent render time of around 450ms. Javascript is all async, and it uses a CDN.

      One thing I can say is that performance and usability on low-powered Android phones is a concern. I have a rubbish android phone that I use for testing, and it’s not a pleasant user experience. This could account for the lower engagement on Android phones.

      • Use Analytics to filter results for only 4.x android phones, then?

  6. I’m new to A/B testing and multivariate testing. Do the plus/minus ranges matter here? My gut is that they do, but I didn’t see them mentioned.

  7. Thanks for sharing, this is an interesting post–and something we were discussing in a meeting just last week. You wrote: “Another tracking metric I’ve setup is event tracking in Analytics.” Which ‘Analytics’ program are you using for that event tracking (menu button clicks) specifically? Just curious :)

    • Sorry, I should have specified – that’s with Google Analytics.

      Let me know if you want any more info on event tracking etc.

  8. Of iOS and Android users, did one group show more result variance between your test conditions? Or were they roughly equally helped/enticed/confused/mystified across all test conditions?

    • As there were a lot less interactions from Android users, the data becomes a bit more sketchy.

      Ironically the most clicked variant from Android users was actually the burger + menu + border combo.

  9. Thank you for sharing your results. As someone who’s been running tests for a number of years, I want to caution everyone from drawing conclusions from your results. First, it should be stressed that this is data for your users only, and that may not be representative of other populations. Second, you mentioned you achieved significance but not to what extent.

    Until such tests are run on a wide variety of populations with similar outcomes, it’s hard to definitively make conclusions. Everyone should be running experiments on their own user base before adopting changes, and most certainly so in ecommerce where sales are on the line.

    • Wise words.

      The variables involved in these tests are myriad.

      For example, occasionally we get traffic spikes to the site. If such a thing happens I need to pause/halt all tests immediately, as this will throw things way off.

      This particular site was a reasonable candidate as it has broad appeal.

    • Well said. It’s all too easy for people reading test results to impulsively change their own site in the hope that they will get the same results. Test! Test! Test it on your own site first! did I say Test?

      • Did you just say test?

  10. If the border makes the button more clickable, more easily percieved as a button, I’m curious if adding a shadow would make it even more so.
    Everyone is into “flat” design now, butI personally often find it harder now to identify buttons etc. Affordance/signifier quality is significantly weaker, imho.

    • Perhaps I will do some other tests on both desktop and mobile nav buttons to see if shadows make an impact. Funnily enough I removed all the shadows about 4 months ago.

      I guess I got so used to seeing flat design that the shadows seemed to busy.

  11. Fascinating results, thanks for sharing the results. The hamburger menu seems to be still finding its feet with users, while developers and designers have jumped in to using at as a de facto solution. More testing and information like this is required.

    • Thanks for sharing your results James.

      I wonder how much desktop browser choice has to do with recognition of this particular pattern. Since iOS users were 2-3x more likely to interact with the standalone menu button, I wonder if they’re predominantly using Safari instead of Chrome (since Chrome implements a standalone hamburger pattern).

      Out of curiousity, what’s the breakdown of browser usage on the test site(s)?

  12. Thanks for sharing the results! I had a gut feeling about this, but the numbers are definitively incentives to inspect that item further in our own designs.

    About quantum mechanics (I’ve got a M.Sc. in Physics), there’s nothing mythical in the state of uncertainty. The gist is, that the act of measuring itself will influence the system, since it’s so tiny. Beforehand you have some probabilities at hand, how a particle/wave might react. Then you measure, and in quantum mechanics measuring always changes the system. The final result directly represents, _how_ you measured the particle.

    This has a counterpart in UI testing, too: If you measure towards achieving a certain goal (like conversion optimization), make sure to _measure the right things_.

    • Awesome. As I said I could barely grasp the basic concepts of quantum theory but its fascinating.

      I think a lot of AB testing (including my own) is useful to look at, but must be applied judiciously on an individual basis.

  13. It would be interesting to see if this works as well in some other languages.

  14. I am proudly a proponent for utilitarian design. I believe beauty is intrinsic to design and that there is no need to force aesthetics, simplification or abstraction; in this case, sometimes a button that says “menu” is just down right superior to a (potentially ambiguous) pictograph.

    There are so many design patterns and industry trends that appear to be design for designers — instead of thoughtful and pragmatic design, for the target user.

    Thanks for sharing.

    • Your welcome. The word menu also worked better on a desktop layout test I did about a year ago. “Hamburgers” on desktop did not work well at all.

      • “Hamburgers” on desktop did not work well at all.”

        Could you possibly send me the link to these test results, I would like to see the distribution of results. I speculate there will be more overlap from mobile to desktop as mobile devices begin to out number desktops. This is due to a need to minimize cost of delivery of applications, but as well companies exploring was to unify designs across device ecosystems. Increasingly, you are seeing variations of hamburger menus on desktop experiences from Apple and Google, just to name to prominent software companies.

        • Unfortunately I don’t have any details of that test (it was over a year ago, and before I started this blog).

          I notice that Firefox (29) just placed a prominent hamburger on the menu bar, and of course Chrome has had this for a long time. So yes I concur that hamburgers are appearing more on desktops. Perhaps at some point I may do another desktop AB test.

  15. Hi – this is very interesting.

    Please can you explain what you defined as conversion? Was it clicking on the menu button or some other measure? Thanks.

    • A conversion is a user tapping on the button/icon.

  16. #1: In your tests, 97.5% ±0.5% stayed on the page and did not need/want/know to select the Menu option. I imagine that a blinking, red MENU icon would get more hits, but it’s not clear that’d improve the user experience of the site. So why measure the small fraction who select secondary info/commands, without considering either how important it was to leave your home page, and how distracting the menu was for the huge majority of people who (presumably) didn’t want to be pulled away?

    • Yes. If conversion is just number of people who clicked the button it’s not about best or worst but about most attention and least attention. This is information that can be used to shape the the design to meet business goal so while it it interesting it shouldn’t become a pattern that words are better than hamburgers.

    • Yes! As I commented on the previous post – it’s about intent. Why should a user need to consult another page after landing on your site?

      Did they not find what they wanted (in this situation I suspect they leave the site rapidly). Or maybe they wanted to see more.

      This is very hard to measure.

  17. #2: I think your smushing one of physics’ most profound principles bears on your tests. Per Wikipedia, “…Heisenberg stated that the more precisely the position of some particle is determined, the less precisely its momentum can be known, and vice versa.” In your case, measuring users’ response to different menu formats necessarily changes their expectations about what they see on screen.

    I imagine (don’t know) that iOS users are much more likely to recognize the hamburger icon in the top left, because they’re used to seeing it on a few of their apps, while I’m not so aware of a coherent menu principle for Android. If you have frequent visitors to your site, they’ll eventually learn (subconsciously or explicitly) what the icon is for, using it on repeat visits.

    Last night in an app I use frequently, I went thru the hamburger to the menu, but it didn’t help me find what I wanted. I’m conditioned to think of it as a menu, but that doesn’t mean the fast food I want is on it.

    Still: keep up with the (constant) task of trying to understand what your users want. It’s the mark of somebody whose arteries haven’t hardened and is out trying to serve needs, rather than hold on to business.

    • Yes so many variables and so hard to draw conclusions… but always worth digging deeper. I took my physics books back to the library today, so who knows where the next inspiration will come from.

    • I’ve grown to accept that menu icons will vary in appearance from site to site or app to app. I believe this debate will always be subject to the current design trends, but my general conditioning expects the primary menu to always be located top-left or top-right.

      I’d like to see some data that compares appearance vs position.

      • I would be curious to know how much the placement of the menu in the top-left or top-right has to do with how your user base uses their phone.

        I remember reading some research last year that argued that since the majority of the planet is right-handed (I’m left-handed!) to place the menu button (on apps or the small screen of a responsive site) in the top-right.

        Not sure I agree with that, especially after research I read this week (that’s from 2013.. url: http://www.uxmatters.com/mt/archives/2013/02/how-do-users-really-hold-mobile-devices.php) that doesn’t necessarily draws the conclusion that the other research did.

  18. I think you’ve missed the point — if we wanted to keep a “MENU” in our UI, we would, because it obviously works and nobody argues about that. The point of the “hamburger icon” exercise was rather to find a suitable visual metaphor for mobile navigation if we’re constrained by size.

    • Yes good point, but I would suggest that there are many sites using the hamburger because it seems the be the thing of the moment. I’ve seen a lot of these icons appearing on desktop layout sites where there is no such size constraint.

      Good discussions, thanks for your input.

  19. Unfortunately your test results are not valid as your sample size was not large enough for the conversion rate tested. I ran your numbers through a chi-squared formula and, assuming 95% confidence level, the only certain conclusion is the plain “menu” label performed worse.

    Your sample size would only demonstrate a valid increase for the bordered menu button with an 88% confidence interval, which just isn’t good enough and introduced too many potential testing errors.

    • What I would like to do is sort out my own AB testing configuration so I can run some really long experiments and get some better results.

  20. Can you tell us what “Chance To Beat Baseline” percentage Optimizely reported for the variations?

    • 94.2% for MENU + Border
      76% for MENU + burger + border

  21. Hi James

    Really interesting results. I’d be fascinated to see if the pattern was the same if you ran the same test a second time. It does seem a useful set of results, and if it repeated again then I’d be very inclined to roll with it.

    Cheers
    Matt

  22. It’s a really interesting experiment. I think the conclusion you can make is “For some reason people click text ‘menu’ more frequently than the abstract pictorial icon.”, which is already interesting. However, you could’ve gone further investigating how many clicks on ‘menu’ were valid – the percentage of users found the right function in the menu or closed back. In terms of UX design, clicking ‘menu’ frequently does not always indicate better experience.

  23. Have you tested variations of nav?
    Working on few restaurant sites where menu wouldn’t work for navigation.
    Just wondering how recognizable nav is.

    • I did test some variations last year, but that didn’t include the word nav. I suspect that outside of the developer world, your average users would not understand that (assumption – would need to be tested).

  24. How good the data is is less interesting than the discussion it is generating. The hamburger has been adopted widely but with little thought. Firefox is getting it even. Even desktop sites with plenty of space opt for it. It’s a useful visual metaphor to have but to be used wisely as all elements should be.

    Three things.

    “Menu” in other languages. Could cause even more space problems.

    Talking about the hamburger with users when I’m doing support is a royal pain in the bun. “Yes, you see those three lines in the top left, click that, it’s a menu, we call it a hamburger… I know… just a name. OK I know the middle line isn’t narrower than the top and bottom but…” and so on. Hamburger isn’t a great name to be using when talking to some customers during support. Calling it “menu” doesn’t help, they tend to look at their browser or OS menu in that case.

    Thirdly, when in Chrome you end up with 2 hamburger menus. In the chrome and in your site.

    • You are right that the hamburger is beginning to creep into desktop layouts. So maybe over time there will be wider understanding of it.

      I wish we had a better name for it…

      • In regards to the term Hamburger, I have also heard and used the term “Shoulder” menu; referring to the top left location of a screen element in relation to the same abstract positions on a person’s body, i.e., “Header”, “Footer”, “Body copy”, etc.

        • Interesting. Different names for this icon: hamburger, sandwich, hotdog, list, and shoulder.

    • And if you use Windows 7, the three lines have are a metaphor for “grippable area” as is visible on a scrollbar. Easy to forget if you’re on a Mac or are using Windows 8. So its visual meaning is even more muddied.

  25. I’ve had similar results when testing the hamburger vs a ‘menu’ buttons in two separate occasions.

    • Do share your stats from the test. I’d be interested.

  26. Mr. Foster, this is wonderful! Thank you for taking the time, sir.

    • You’re welcome. Hope you got something useful out of it.

  27. Great writeup

  28. Thank you for this! I’d personally love a post about how you create your event tracking for google analytics, and what sorts of things you normally track.

    • Great idea. I’ll write something up at some point.

  29. Thanks for the energy you’ve put into the tests and the comments to your posts.

    I wonder, though. I wonder if this type of test can only indicate that 100% of people who clicked your button clicked had your button?

    It can’t tell if they wanted to click.

    It can’t tell if they clicked by accident.

    It can’t tell if a group of users needed the after-jump content but couldn’t work out what to do.

    It can’t tell if a group of users felt they didn’t need the after-jump content.

    It can’t tell if a group of users didn’t trust the quality of the after-jump content implied by the attention styled into the click target.

    Don’t get me wrong. I love user testing.
    I love that your data is inline with my belief that a full pendulum swing away from skeuomorphism is just as bad as the full swing toward skeuomorphism.

    Maybe we need to run some AB tests on how useful AB tests are? :D

    • Yes to all that. I think the best I can say about the results is — you don’t have to do a thing if it appears that everyone else is. You can test it, and maybe have an alternative, but the data is murky.

      Maybe we can AB test the quality of comments and how long a user stays on a page :)

  30. I would be very interested in a deeper comparison on the smaller platforms. Specifically windows phone since they do t really have apps that use the hamburger – and deploying HTML apps with native wrappers often makes one go to the hamburger icon since it works ok on the two biggest.

    • Windows phone is difficult as it makes up such a small portion of visitors. I guess you could run the test for a long time (like a month).

  31. Don’t forget to be careful about making such assertions without checking the statistical significance of your result. Using an test like http://getdatadriven.com/ab-significance-test you can see that the difference between the baseline and the Menu+Border+Hamburger, and between the menu+border, are not significant at the 95% level. (Menu without Border is significantly worse though.) It would be better to keep running the test to gather more data to be more sure about your conclusions.

    • Until I can make a cheaper (i.e. free) way of doing my testing, longer tests are out of budget. But yes, agree totally.

      • James: just to be clear, you agree the results of the test may simply be random variation and mean nothing at all? Unfortunately, that’s what we’re saying. But please don’t take this as a harsh criticism. You have to start somewhere :-)

        • Criticism is absolutely fine – as long as it is helpful. I’ve learned so much from the awesome discussions following this post. I would never have that if no-one had read.

    • I think the fact that he’s published these results means he doesn’t understand what the null hypothesis is, and also what your point is. He’s running the results at about 80% confidence by my calculations.

      This may be the Dunning–Kruger effect in effect, but I think it’s good he’s at least tried some testing.

  32. I wonder if one difference in behavior between iOS and Android users is that for much of its existence, Android has had a distinct menu button or function. Users have gotten used to that functionality, whereas iOS users have had to always use a button provided in the software to access a menu.

  33. Nice write-up.

    I’ll ignore the issue of whether your statistics are sound, and instead say that if you achieved this result, then the best thing to do is to re-test against your hypothesis. If you think the border was what did it, then make the boarder bigger, fatter, or something like that, then test again against the other variants. If it wins bigger, then you’ve got a stronger hypothesis. And it if doesn’t (loses, flatlines, etc.) then maybe your hypothesis is wrong. Point being: one data point does not a conclusion make :-)

    Basically, you have to test and learn against an ongoing hypothesis, otherwise you may as well just not test and rely on your gut. I think a lot of people don’t grasp that point.

  34. Like how you feel you can relate quantum theory to UX design (:

  35. Very interesting, could you test with the menu icon on my website? It’s three vertical dots, and I’ve always wondered.

    • Hi Daniel, that is an unusual icon, but does appear in a few places (as kind of a “more…” button”). I suspect that is significantly less understood than the hamburger.

  36. Great test, I noticed the other day that Chrome changed the three line icon on the far right from grey lines to orange lines. Have you done color tests and different background tests?

    • I haven’t tested different colors. It would have to be a completely new test – looking at colors only (not mixing colors and borders etc).

    • They didn’t change the colour of the icon per se, that particular icon acts as a notification. Mine was green a minute ago; which was signifying that a browser update was ready to install, I have seen it orange too but am unsure as to what this signifies, I’d guess it means update pending or something.

      Anyhow after the update it went back to grey :D

  37. Thanks for sharing the follow-up results. Very interesting indeed.

  38. Great article!
    Chrome gets the orange lines when it needs updating I think.

  39. Forgive me if I missed it, but it doesn’t look like you tested a specific task, you just monitored the use of the affordance. Is that true? I think Craig Sharkie’s comment is alluding to this. Without knowing anything about why a particular user visited your site(s), or what they were trying to accomplish, I wouldn’t be able to draw any actionable conclusions from the study.

    The results of your study seem to confirm that an icon alone is not going to convert any users. If I visit Nordstrom’s mobile app while casually surfing the web with no goal in mind, but I end up making a purchase, how would we credit the design of the icon as the cause? You could try and correlate rate of purchase with each icon, but you don’t have insight into user motivation or intention.

    A smaller number of qualified participants taking a task-based think-aloud test and responding to a few Likert Scale statements would probably give you better insight into caffeineinformer’s users (and design) than analytics alone.

    • And there you are getting to the root of content-focused sites in general. What exactly is the goal? It’s not to buy a product, or to sign up to a service. So what exactly are you measuring? What is a ‘conversion’ in the broadest sense of the term?

      If one goal is to help users delve deeper into the site content, then perhaps giving them the easiest path possible to finding that content is a success. Of course if they have already found what they needed (such as an answer to a question), then it’s unlikely they’ll go looking for anything more.

    • Collecting click-through data blindly for something like a UI element doesn’t tell you anything. Were the users looking for something in specific that might be likely to live in a menu )such as account information?) If not, this test only tells me that perhaps the text with border is more “explorable” by users who are just clicking around. But it doesn¹t tell me with any certainty that if a user was actually looking for a specific item that is hidden in the menu (and which they were sure must be somewhere) they wouldn’t be able to find
      it with the hamburger or without the bordered text.

      A/B testing is more appropriate for tests such as seeing which advertisement drives more click-through. That advertising destination wasn¹t where the user wanted to go when they landed on the site yet they were enticed to click on it. That is a different situation than with UI elements which are there when you need them for a specific purpose (such as to find something in an menu). That can only be counted as a “conversion” if they are supposed to find the menu.

      Bottom line – numbers (i.e. data) aren¹t *inherently* meaningful. Just
      because something can be measured doesn¹t mean that you learn anything by doing so. You need to have the right context. I believe that this experiment is misguided.

      • I don’t agree. Clearly a click does not tell you user intent, but it can be a useful measurement.

        If I’m looking for increased time on site, engagement, more pages per visitor then this will tell me. Whether the user clicked it because they wanted to find something, or just like the look of it and wanted to click it — they are still engaging with the site.

  40. It’s not a “hamburger icon”. It’s a “list icon” (it is quite literally a visual representation of a list of items).

    Just……..stop……..no……bad developer! Now go sit in the corner and think about how stupid you sound.

    • Hamburger is a stupid name.

    • Yeah good article there.

  41. The best (and little mentioned) part of this is that iOS itself has gone the route that did the worst in your test.

    • Agree that maybe iOS went a little too flat. You end up trying to tap anything to see if something happens.

  42. Thanks for the statistics. I think the bordered menu won’t harm that much the look and feelings and I think it’s as important as the logo. so adding a bit of weight either with borders or background color is a valid solution.

  43. Interesting indeed! I have my doubts on whether these differences are in fact significantly different. They appear so small! What alpha did you use?

    • The test was not big enough. A typical alpha for AB tests is 0.05. This test failed to meet 95% confidence (it hit 94.2%, and my budget ran out with Optimizely. I’m in the middle of a bigger tighter test with some inhouse code). Draw conclusions at your own risk.

      To be honest, I think If I’m making a 30 second change to a page on a site, then I’m willing to go with a low confidence. Obviously if it was going to be a big costly change then I’d want very high confidence.

  44. You know, there’s nothing wrong with learned behavior. I think as this icon grows in ubiquity, this is a non-issue.

    • In time yes. But if the icon indicates a list, the question is: a list of what?

  45. One thing that hasn’t been mentioned is Google’s practice of using a three vertical dot substitute a uncoordinated menu list while still leaving its search “magnifying glass” in the app(s) for its gmail implementations in Androids, apparently beginning at about A4 or perhaps a little later. What the magnifying glass is responsible for accomplishing is apparently the classic function of searching through all the email to which it has access. For example, I entered the full name of one member of the research team I’m on and it returned his direct emails to me, our boss’s emails to me (on the occasions when he was copied), etc.

    I’m jes’ sayin’….

    virginia

    • Virginia, the 3 horizontal (or vertical) dots is a whole ‘nother icon that has slowly been appearing. I would like to test that one.

  46. Hi james,

    Thank you for this nice UEX post. We’re using a mobile framework with the standard hamburger + border situation but on top of that we are using a red notification (iOS7 style) with a number on top of it. We are a mobile app company and use the notification as a number of apps in our portfolio. New visitors know they can tap on it and will see the menu. It’s a nice one to test as well.

    • Having a notification badge is like bees to honey. But would be interesting to test clicks/taps with or without the notification.

  47. I would love to see this test in other languages outside of English. I wonder how translating (length of word) would play in this test or if a word could be misinterpreted.

    Are you continuing this test with other words like “navigation” or “nav”?

    • Kevin,

      I did once do a test on desktop with the word “nav” vs “menu”, the word nav performed so poorly I pulled the test quite early.

      • Interesting – thanks.

        Curious to know if you plan on taking this test further and doing it in other languages?

  48. Hi James,
    thanks for the interesting report on your tests.
    It would seem you are onto something here, but I would like to make a couple of considerations.

    The first, as already pointed out, is that the results cannot be generalised outside your particular website as it defines a particular segment of the general user population (and a highly caffeine conscious one:) ) which may not represent a fair general sampling.

    Secondly, from a research design point of view, for the results to be meaningful it is necessary to know the natural variability in user behaviours, i.e. how noisy is the background signal in relation to the sample size, and whether a difference of a couple dozens units can be considered meaningful or could have occurred by chance. Also I think your difference should be calculated as a fraction of total visitors not as a fraction of the clickers, and is therefore in the order of 0.3%. To verify this means anything it would be necessary to run an analysis of variance against a null hypothesis.

  49. I wonder if any users assumed that “menu” referred to a list of drinks or foods, given that this site is related to caffeine. The combination of the border (indicating something clickable) with the word “menu” (in the context of food) might have generated more results than the 3-line icon alone.
    Would be interesting to try the same test but on the Search function.

    • Oops – guess the search button wouldn’t help much :). Would be interesting to try it with a site not related to food/drink.

    • That’s one thing I considered. Before Microsoft Windows appeared, a menu was something you got in a restaurant. If you went to a restaurant site, MENU would be very confusing: navigation or food?

  50. Especially a burger restaurant ;)

  51. Hi James,

    Thanks for sharing this. Would be helpful always in future, till next Usability Test (with almost reverse results).

  52. Hi James,
    I’d like to point out that the conclusions you have come to are based off of inadequate data, and are therefore null and void, as you did not reach adequate statistical significance in your tests. The only variation that you’ve proven performs poorer is the ‘Menu without border’.
    A basic calculation proves this: http://www.usereffect.com/split-test-calculator.
    Thanks

    • Which is why I did another much larger test here. This second test had only 1 variation and achieved statistical significance.

  53. Adding color to the text only version would be interesting.

  54. Fantastic test.

    P.S. Pretty please put published (and edited) dates on your blog posts. A pet peeve of mine is having to hunt around blog posts to find out if they’re still relevant or old (which often implies irrelevant ;)

    • Yeah good idea. I’ll see to it.

  55. Hi James,

    Thank you for sharing your work. The effectiveness of the “hamburger” icon is something I’m often asked about by clients who I run lab-based usability testing for. What I can say from my own observations over several studies is that the icon is not universally understood and can contribute to users not discovering content that can only be accessed via the menu.

    I’m not 100% familiar with the tools and methods used in A/B testing, so this may be a naive question, but are users in the test you describe given a specific task to attempt when they participate?

    Thanks again,
    Steve

    • Thanks for the comment.

      These kinds of A/B tests are effectively “blind” tests. The user will be served 1 of 2 (or more) variations, then I measure the outcome.

      To get any sort of statistical significance I need a large sample set – otherwise randomness could account for differences in behaviour. That’s why I did a second test, much larger over here.

      I am no expert in A/B testing. I was just trying to help my users engage more with the content, and I thought I’d share the results.

      It’s interesting that Google are pushing out a kind of “half hamburger” to some of their tools.

      • Thank you! What instructions is the user given when they start the test? And when you say “measure the outcome” what exactly does this type of test measure? Is it the first click on the page or something else? Thanks again, the reason I ask is that it would be great to point clients to this type of work if it answers their questions about burger menus!

        • With A/B tests, users are not informed. One user will see one kind of button, another user sees a different version of the button. In this case the number of clicks (or taps) of the button were measures. I also use other data from Google Analytics (such as time on page, bounce rate, pages per user, etc).

          In this particular test I measured the first click/tap of the menu button. So, more people clicked the button when it said “MENU” than when it showed the hamburger icon. As so many of these comments point out, interpret what you will. We cannot measure intent with A/B tests. A user may have been curious about a particular button and just clicked it. Perhaps their intent was not to drill down into the site navigation.

          We just can’t assume the intention, we are only measuring clicks.

  56. Hi,
    On a restaurant website(responsive) the word ‘MENU’ refers to something else than a Nav Menu,,, so can we use the word ‘Navigation’ with border or just ‘NAV’ with a border??? is there any test about that ?

    • Maybe words like MORE or HELP. I did once test the word NAV and the result was terrible.

  57. I hate the Android vertically sliced ‘half hamburger’. When I first got an Android phone I though it was a design flaw.

Add a Comment

Your email address will not be published. Required fields are marked *