At a first glance it is not surprised me at all. An implemented analysis myself was about comparing cholesterol levels with total sample size 200 and with median = 2.3 for both groups. The p value from Wilcoxon rank-sum test was 0.03, significant at 5% level. The reason seems different tails for the two groups.
I disagree that the test hypothesis for Wilcoxon is of the same medians. Maybe more precise, it is to compare the distributions. For a non-normal distribution (as assumed), probably median is the most commonplace parameter to describe the distribution, but not that it is the distribution itself.
In that test, we hypothesize that both groups are from the same distribution but to test if one group has more tendency to draw a larger value (ranked higher) than the other group does.
You may picture the distributions, such as histogram, to compare the two groups visually. You may find the two groups differentiate each other on the skewness and so on.
All in all, the result is statistically possible and not very much unanticipated.
JingJu
|