Why do 30% of all positive integers start with 1?

Craven de Kere wrote:

Bill explained it.

YES! Thank you so much for understanding that gobledy gook Craven. I am not a mathematician, so far from one in fact that I can't understand g_day's response to my answer as well as I did the question Embarrassed

I was sure I had the calculations right, and your tiny statement kinda vindicates for me the 2 futile hours I spent trying to write it in equation form.
Thanks again! :wink:

0 Replies

I understand how it can hold true for the average of random number, or an average of sets. For example the set of 1, 1 and 2, 1 through 3, 1-4, 1-5, 1-6, 1-7, 1-8, and 1-9. The chance of the first set starting with 1 is 100%, the second if 50% and the last is 11%. So the average over all would be 30%. And that pattern would hold as I increase the max range. So that makes sense if you were to randomly pick a set of numbers. But it doesn't seem to work if you just randomly pick A number. Am I going about this wrong?

0 Replies

Oh, I'm sorry, I seem to have missed one of craven's posts, which makes a lot of sense to me. I still think it's a little beyond me, which is why I'm trying to figure out what everyone is saying.

0 Replies

Even so, I only understand it if you start at 1 working up.

0 Replies

Bill

You are correctly stating that the first digits are all equally exclusive, so the relationship holds P(i) = probablity of first digit being i is for example.

P(digit 3) = 1 - P(1) - P(2) - P(4) - P(5) - P(6) - P(7) - P(8) - P(9)

But you inferred rather than proved P(i) > P(i+1) for all values of i (which is by the way true) and then rather elegantly showed the probablities must be a decay curve whose 9 elements sum to 100%.

The trick to seeing first digit occurrency frequencies are logarthmic is to see its crucial to work out what groups you use to sum to infinity to show your results hold for all values.

Thomas showed if you sum by {1..9} + {10..99} + {100..999} etc to infinity the distributions should be equal. But what is he say sumed any of

{1} {2} {3} or
{1..5} {6..10} {11..15} {16..20} or
{1..3} {4..5} {6.7} {8.9} {10.11} {12..13} {14..17} (i.e. prime breaks)
{1..50} {51..100} {101..150} {151..200}
{1..2} {2..4} {3..8} {9..16} {17..32}

You should get the same converging result for each of these group - which all count all positive integers. But you don't - they all converge to different answers is the rather stunning finding. Thomas was very wise actually to count by a power series centred on our base 10 number system, but a different grouping base (e.g. a 2 power series or a 7 power series) would have shown inconsistent convergence results again - meaning it is also flawed.

Only a logarthimc grouping (for any base) shows consistent convergence, and this was the proof Glass originally used.

0 Replies

g__day wrote:

Bill

You are correctly stating that the first digits are all equally exclusive, so the relationship holds P(i) = probablity of first digit being i is for example.

P(digit 3) = 1 - P(1) - P(2) - P(4) - P(5) - P(6) - P(7) - P(8) - P(9)

But you inferred rather than proved P(i) > P(i+1) for all values of i (which is by the way true) and then rather elegantly showed the probablities must be a decay curve whose 9 elements sum to 100%.

The trick to seeing first digit occurrency frequencies are logarthmic is to see its crucial to work out what groups you use to sum to infinity to show your results hold for all values.

This is all way over my head. If the part I inferred rather than proved was that the increase in number of digits in the set proved slightly more accurate while honing in on perfection (or at least to the realistically countable decimal point), that's not true. I just didn't show it. I proved it to me by first dividing 111 by by 199, then 11111 by 19999, and finally 11111111 by 19999999. This proved sufficient to show me that it actually tightened up as the number of digits increased. I also realize that my map only showed the possibilities of "one" regardless of what the starting number was. I was to lazy to map out the other numbers but assumed since the variation in the model only swung from 11.11% for "one" down to a minimum of 01.22% (or 11 divided by 899) for "nine". From here I could see that I could chart the whole progression, but it would be a pointless pain in the azz.(especially since it already appeared to resemble the results you offered from your random number gererator. Come to think of it; that's probably the part you meant I inferred? I appologize for writing this out, but I haven't taken a math class in nearly 2 decades, and never an advanced one. Hope you can understand what I tried to write. Embarrassed

And thanks for the puzzle, it was fun. :wink:

0 Replies

Bill

Your logic is actually very, very good and naturally intuitive!

P(1) > P(2) > P(3) ... > P(9) and we can say more

P(1) - P(2) > P(5) - P(6) > P(8) - P(9)

Its only the number groupings themselves that are tricky little buggers - they hide the irrationally of puzzle so well it goes unnoticed (to the point now many fraudsters are being caught).

Had you of chosen other boundaries (yours were 199, 1999, 1999999 etc) further away from a power series, you would have seen slightly different results. The trick is to to realise this and examine just how moving the boundaries on your number groups changes things. This small insight took number theorists originally by surprise.

0 Replies

I opened the computer calculator (more digits) and I see what you mean! Neat trick, I won't forget it.

0 Replies

Quote:

My position: This axiom is about sets. That's the only way it ever makes sense because it relies on sequence.

Oh! Ah! (Needless to say, I totally and completely misunderstand what was going on here. I think it's starting to dawn on me, thanks to Bill and that line I quote there.)

0 Replies

Random integer?
g__day : I understand that an 'average, limited set of natural numbers from 0 to N' will have first-digit distribution according to Benford's law; I understand that if you consider the limit of occurence of 1 in the first place the only way it really converges is by logarithmic baskets.
I understand that 'real-world' data, on average, obeys this law - the sets are upper-limited on random.

I don't understand one thing though : how do you choose 'a random integer'?
There is no such thing as a randomly-chosen integer (as Craven already pointed out, but I am rewording).

So to formulate the theorem as stating '30% of ALL integers start with 1' is misleading.
Because we can divide integers into nine subsets, each subset holding integers starting with a certain digit. Now we cannot say that the set holding integers starting with 1 is any larger than the others. It can be only shown that the sets are all countably infinite, and we can make a bijective projection among them.
For any finite subset of integers this is possible, but not for all integers.

So the theorem is to be understood literally, or probabilistically as described 'for real data sets, for random limited data sets, as a limit going over baskets of logarithmic size, as an average of distributions of all possible ranges of integers 1-N', but I object to the '30% of all integers start with 1'.

Comment ?

0 Replies

Relative

Agree mostly with your first para - but the proof is much stronger than that - discount your average limited set qualifiers - the proof cover all integers - period! The fact it needs logarthmic group to converge isn't a trick - its the essence of only this solution being in total harmony with the underlying problem.

Para 2 - practically extremely difficult (see my other post on a perfect random number generator and my posts in SCoats detabes on randomness) but randomness and distributions its not tied into this proof.

Para 3 - totally disagree - that is exactly how I was taught and what the theorem and Hills proof actually show; it is that powerful to talk to infinite sets using limit theory and convergence tests. Personally I find it alot more amazing that it works for relatively small sets of numbers - a thousand at a time or less - and that it can be applied to real world data for such a wide variety of sources.

Para 4 - hopefully my POV on the theorem and the law are clearer now. You may find it hard to swallow - but that's why it took so many decades to arrive at such a comprehensive, complete proof. Powerful statements about infinite sets, even with modern day mathematics, can take years or decades to mature.

0 Replies

I see that number theory has really advanced beyond intuition a lot.

I thought about the statement: "30% of all integers begin with one" a bit more. I found one of the below meanings is what I understand intuitively as the essence of the statement.

- Choose an integer on random. 30% will begin with 1.
** but this is not possible to do **
- Go from 1 up. Calculate % of numbers beginning with 1 on the way. % will vary from 11.111* to a little over 50%. On average it will be 30%.
** this is less satisfying, but still intuitive **
- Study a family of sequences of sets of integers that cover entire set of integers. The family of sequences is shown to posess statistical properties that are equal to properties of the whole set of integers. Check existence of limit of % of first digits.
** this requires a bit of intuition-stretching **

I noticed Thomas mentioned the cauchy criteria, which on the first glance cannot hold for sequences of integers and first-digit distribution : " limit .. no matter HOW the x goes to infinity". [ says to himself -> The methods of number theory will have to be studied]

I will definitely try to read the proof as soon as I have the time. I hope I'll be able to understand It Wink

0 Replies

Note psuedo random numbers do display this trend very well.

Yes I thought finding a Cauchy sequence for wierd groups would be hard, as would doing the count for certain odd groups - but the folk who elegantly worked out the formulea are simply damned smarter than me!

The wierd thing I never asked was if we counted from infinity in rather than zero out would the frequencies be exactly reversed Smile

But limit theory doesn't yet go from infinity in AFAIK.

I wrote a program like the one I posted not to prove the theory but to try and dissprove it as rubbish. We were given the theory one day and the proof over 3 following days. On the afternoon of day 1 I wrote that program because the idea seemed idiotic to me. The lecturer was more than surprised the next day because he said to me "No one has every done this before, this is amazing" which surprised me (perhaps he meant no student, but pue mathematicans didn't usually test things with applied tests!). Next he asked me where I got my random number generator from - answer Head of the Maths department. I had gone to see my brother who worked for head of Maths and just happened to bump into both of them at the end of solving a large number theory problem. I said what I wanted to do and asked for the best random number generator you could program - he (Prof Jim Cannon - Sydney Uni) gave me the mod by primes and the actual primes himself as I had no way back then of finding such massive primes.

I was blown away when the program showed such quick convergence to the theory. By 10,000 numbers it was looking rock solid which freaked me. When I took it to the lecturer the next day at the start of the lecture he went thru my actual results and presented it to the class before the formal proof began. The class was stunned too that you could see this trend appear so quickly with a 1 page program run for less than an hour.

Until I could see that counting for humans is finite and it starts from 0 outwards - not infinity inwards, I had no metaphor to grope with. But I perceived an analogy of the car odometer, it starts at 0 and slowly goes up, each boundary change starts at 1 and stays there for the longest time. So I saw its all because of humans count from 0 outwards that we get this strange rule, which is partly right.

And now forensic auditors, underwriters and claim assessors are having a field day with this one, because it does apply so strongly even on small data sets. Even knowing the theorum makes it hard to cheat - because any random sample - from any sets of periods or selection of widely priced items or geographies across any slice of years - should show this trend. That is simply amazing. Any random selection of company data can be used to detect fraud in just a single area == WoW!!!

0 Replies

Too trippy.

0 Replies

Why do 30% of all positive integers start with 1?

Related Topics

Quick Links

My Account

able2know