Processing math: 100%

Kaggle

[Kaggle] AIMO2 : QwQ-32B-Preview, Early Sharing Prize 얻은 대세의 Long-CoT 모델[4]

내 이름인데 윤기를 왜 못써 2025. 1. 16. 18:42

 

개요

저번에 작성했듯이, 원래 LB 최상위권은 7~10점을 유지하고 있었다. 그 축에는 당연히 Qwen-72B-Math 모델이 있었고, 다양한 생성전략으로 LB 상위권을 유지했을 것이라 예측한다.

그러던 중 갑자기 QwQ-32B-Preview 라는 모델이 등장했고, 대회에서도 원래 규칙이었던 2024년10월 이전 모델만 사용가능한 내용을 화이트리스트를 통해 허가를 받으면 허용해준다고 변경했다.

그리고, 화이트리스트에 올라가자마자 QwQ를 이용한 결과가 나오기 시작했고, Early Sharing Prize도 일부 생성에 필요한 파라미터를 최적화함으로 받을 수 있었다.

별개로, 저때 QwQ를 이용한 결과들이 리더보드에 채점됐는데, 당시 한 사람만 20점이 넘는 결과가 나와서 그 사람이 상을 받을 수 있었다.

하지만, 그 이후로 추론 결과의 Randomness에 대해 다른 참가자들은 문제를 활발하게 제기했다.(Early Sharing Prize를 받은 노트북을 그대로 실행했는데도, LB점수의 편차가 컸고, 다른 말로는 실행 리전의 환경에 따라 다른 것이 아니냐...등의 많은 얘기가 오갔다.)

 

더보기

이 것과는 별개로 나도 저 모델을 활용해서 리더보드에 제출했다. 다양한 생성 파라미터를 실험했고, 실험한 과정에서 느낀 것은 확실히 Randomness가 너무 크긴 했다.(처음 제출했을 때 19점을 받아서, 잠시나마 25등이었지만,, 지금은 20점을 받았지만 82등까지 떨어졌다.)

현재 상위권 대부분이 이 모델을 활용해서 좋은 성적을 내고 있을 것으로 추정하고, 모델의 추론 전략과 성능에 대해 기록하는 것이 의미있을 것이라 생각해서 이 글을 작성한다.

 

QwQ-32B-Preview

QwQ에 대한 Qwen 블로그의 [아티클]은 이 곳을 참조하자.

시작부터 설명이 엄청나게 철학적이다. 요약하자면 계속해서 질문하는 사람이 되어야한다..라는 정도의 내용같다.

알려진게 많이 없어서, 저 블로그 아티클을 참고해서 작성할 수밖에 없는데 자기반성과 질문을 통해 능력을 끌어올렸다고 볼 수 있다.

 

모델 스펙

  • 파라미터 수 : 32.5B
  • Layer 수 : 64개
  • GQA Heads : 40 for Q, 8 for KV
  • Context Length : 32,768 tokens

모델 스펙에서 특이한 점이라고는, 컨텍스트 길이가 32,768로 기존의 Qwen 모델과는 차이가 있다. 이는 뒤에서 생성결과를 보면 왜인지 알 수 있다.

 

벤치마크

깊은 추론을 통해 복잡한 문제에 대한 답을 내는 모델로, 벤치마크에 사용된 테스트셋도 공학, 수학, 코딩 과 같은 것들이다.

  • GPQA : 과학 분야 Grade School 수준의 Q&A 벤치마크
  • AIME : 어려운 수학 문제 테스트 벤치마크
  • MATH-500 : MATH 벤치마크의 500개 test case 벤치마크
  •  LiveCodeBench : 프로그래밍 벤치마크

QwQ 벤치마크 결과
QwQ 벤치마크 결과

QwQ-32B-Preview의 결과는 충격적으로 좋게 나왔다.

o1-preview와 비교하면 GPQA, LiveCodeBench를 제외하고 수학문제를 해결하는 벤치마크인 AIME, MATH-500에서 더 좋은 결과가 나왔다.

o1-mini와 비교하면 GPQA, MATH-500에서 더 좋은 결과를 냈다.

 

생성 예시

이제 생성 예시를 보고, 어떤 방식으로 추론을 해서 좋은 결과를 내는지 확인하자.

예시 문항 1

Find the least odd prime factor of 20198+1.

예시 답변1

So I need to find the least odd prime factor of 20198+1. Hmm, that seems pretty big, but I think I can break it down step by step.

First, I know that finding prime factors of large exponents can be tricky, but there might be some properties or theorems that can help simplify the process. Since the expression is 20198+1, maybe I can look for patterns or ways to factor this expression.

I recall that expressions of the form an+bn can sometimes be factored, especially when n is a power of 2. In this case, n=8, which is 23, so that might be useful.

Let me see if I can factor 20198+1. I know that a4+1 can be factored as (a4+1)=(a2+2a+1)(a22a+1), but that involves irrational numbers, which isn't helpful for factoring into integers. Maybe there's another way.

Wait, perhaps I can write 20198+1 as (20194)2+12, which resembles the sum of squares. I know that over the integers, sums of squares don't factor nicely, but maybe there's a different approach.

Another thought: maybe I can consider smaller exponents first and see if there's a pattern or a recurring factor that appears.

Let's try computing 20192+1, 20194+1, and so on, and see if I can find any common factors or patterns.

First, calculate 20192+1:

20192=2019×2019. I don't need the exact value right now; maybe I can just keep it as 20192 for now. So,

20192+1=20192+1. Doesn't seem immediately helpful.

Next, 20194+1. Again, it's similar to the previous expression, and I don't see an obvious factorization.

Maybe I should try to see if there are any small prime numbers that divide 20198+1. Since I'm looking for the least odd prime factor, I can start testing small primes like 3, 5, 7, 11, and so on, to see if any of them divide 20198+1.

Let's start with 3. To check if 3 divides 20198+1, I can compute 20198+1 modulo 3.

First, find 2019 modulo 3. Since 2019=3×673, 20190(mod3). Therefore, 20198080(mod3), and 20198+10+11(mod3). So, 3 does not divide 20198+1.

Next, check 5. Compute 2019 modulo 5. 2019÷5=403 with a remainder of 4, so 20194(mod5). Then, 2019848(mod5). Since 41(mod5), 48(1)81(mod5). Therefore, 20198+11+12(mod5), so 5 does not divide 20198+1.

Now, check 7. Find 2019 modulo 7. 2019÷7=288 with a remainder of 3, so 20193(mod7). Then, 2019838(mod7). Let's compute 38 modulo 7.

First, find the pattern of powers of 3 modulo 7:

313(mod7)

3292(mod7)

336(mod7) 34184(mod7)

35125(mod7)

36151(mod7)

Since 361(mod7), then 38=36+2=36×321×22(mod7). Therefore, 20198+12+13(mod7), so 7 does not divide 20198+1.

Next, check 11. Compute 2019 modulo 11. 2019÷11=183 with a remainder of 6, so 20196(mod11). Then, 2019868(mod11). Let's find a pattern for powers of 6 modulo 11:

616(mod11)

62363(mod11)

63187(mod11)

64429(mod11)

655410(mod11)

66605(mod11)

67308(mod11)

68484(mod11)

Therefore, 20198+14+15(mod11), so 11 does not divide 20198+1.

Continuing this process seems tedious. Maybe there's a better way to approach this problem.

Let me think differently. Perhaps I can consider the expression 20198+1 and see if it can be expressed in a form that reveals its prime factors.

I recall that for expressions like a4+4b4, there's a Sophie Germain identity: a4+4b4=(a2+2ab+2b2)(a22ab+2b2). Maybe I can manipulate 20198+1 into a similar form.

Wait, 20198+1=(20194)2+12. This resembles the sum of squares, but over integers, sums of squares don't factor nicely. Maybe I can consider complex numbers or some other algebraic identities, but that might not help in finding integer prime factors.

Alternatively, perhaps I can consider modulo properties to find possible prime factors. For example, if p is an odd prime that divides 20198+1, then 201981(modp), which implies 2019161(modp). This means that the order of 2019 modulo p divides 16, and since 201981(modp), the order can't divide 8. Therefore, the order must be 16.

By Fermat's Little Theorem, the order of 2019 modulo p divides p1. Therefore, p1 must be divisible by 16. So, p1(mod16).

This narrows down the possible primes p that could divide 20198+1 to those primes congruent to 1 modulo 16.

Let me list some primes that are 1 modulo 16: 17, 97, 113, 193, 241, 257, etc.

Now, I can test these primes to see if any of them divide 20198+1.

Start with 17. Compute 2019 modulo 17.

First, find 2019÷17. 17 times 118 is 2006, so 20192006=13. Therefore, 201913(mod17).

Then, 20198138(mod17).

Compute 13216916(mod17)

134=(132)21622561(mod17) 138=(134)2121(mod17)

Therefore, 20198+11+12(mod17), so 17 does not divide 20198+1.

Next, check 97. Compute 2019 modulo 97.

First, find how many times 97 fits into 2019. 97 times 20 is 1940, and 2019 - 1940 = 79. So, 201979(mod97).

Then, 20198798(mod97).

This seems a bit cumbersome to compute directly. Maybe I can find a pattern or use Fermat's Little Theorem, which states that ap11(modp) if p is prime and p does not divide a.

Here, p=97, so 79961(mod97). But I need 798, which is not directly helpful. Maybe I can compute 792, then 794, and so on.

First, compute 792(mod97): 792=6241. Now, divide 6241 by 97.

97 times 64 is 6208, so 6241 - 6208 = 33. Therefore, 79233(mod97).

Next, 794=(792)2332(mod97).

332=1089. 97 times 11 is 1067, so 1089 - 1067 = 22. Therefore, 79422(mod97). Then, 798=(794)2222(mod97).

222=484. 97 times 4 is 388, so 484 - 388 = 96. Therefore, 79896(mod97).

Thus, 20198+196+1970(mod97).

So, 97 divides 20198+1. Since 97 is the smallest prime congruent to 1 modulo 16 that we've checked and it divides 20198+1, it must be the least odd prime factor of 20198+1.

Therefore, the least odd prime factor is 97.

**Final Answer**

97

 

예시 출력의 길이가 상당하다.

출력 결과를 보면, 대답을 하면서 계속해서 자신의 생성 결과와 과정을 의심하고, 반성하고, 다시 확인하면서 엄청난 길이의 생성을 이어나간다.

 

AIMO2 생성 예시

AIMO 문제에 대한 생성 예시는 어떤지 확인해보자.

생성을 위한 파라미터는 임의의 값으로 적었기에, 최적의 값이 아닐 수 있다.

  • temperature : 0.9
  • min_p : 0.05
  • top_p : 0.9
  • max_tokens : 32,768

주의할 것은 min_p의 값을 어느정도 올려주는 것이 좋다. 안그러면 중국어를 갑자기 내뱉는경우가 왕왕 발생한다. 또한 max_tokens는 최대값인 32,768로 꼭 설정하자.

예시 문제

Three airline companies operate flights from Dodola island. Each company has a different schedule of departures. The first company departs every 100 days, the second every 120 days and the third every 150 days. What is the greatest positive integer d for which it is true that there will be d consecutive days without a flight from Dodola island, regardless of the departure times of the various airlines?

예시 답변

I'm trying to solve this problem about airline companies operating flights from Dodola island. There are three companies with different departure schedules: one every 100 days, another every 120 days, and the third every 150 days. I need to find the greatest number of consecutive days without a flight from Dodola island, regardless of when each company starts its schedule.

First, I need to understand what's being asked. It's about finding the longest sequence of days where none of the three companies has a flight. Since their schedules are periodic, I should look into the concept of least common multiples (LCM) and greatest common divisors (GCD), but I'm not sure exactly how to apply them here.

Maybe I should think about the intervals between flights. Each company has its own cycle: 100 days, 120 days, and 150 days. If I consider day 0 as the starting point, then flights depart on days that are multiples of their respective cycles.

So, flights depart on:

- Company 1: days 0, 100, 200, 300, ...

- Company 2: days 0, 120, 240, 360, ...

- Company 3: days 0, 150, 300, 450, ...

I need to find the longest sequence of days where none of these companies has a flight. In other words, find the largest number d such that there exists a sequence of d consecutive days where none of the days is a multiple of 100, 120, or 150.

This sounds like a problem involving the concept of the least common multiple and possibly the Chinese Remainder Theorem, but I'm not entirely sure.

Wait, maybe I should find the maximum number of consecutive days where none of the companies flies, no matter how their schedules are aligned.

I recall that in scheduling problems, the key is often to find the overlaps or the gaps between the schedules.

Let me consider the cycles: 100, 120, and 150 days. First, I should find the least common multiple (LCM) of these three numbers because that would be the period after which the schedules repeat in the same pattern.

To find the LCM of 100, 120, and 150, I need to factorize each number:

- 100 = 2^2 * 5^2

- 120 = 2^3 * 3 * 5

- 150 = 2 * 3 * 5^2

The LCM is the highest power of each prime present in the factorizations:

- LCM = 2^3 * 3 * 5^2 = 8 * 3 * 25 = 600

So, the schedules repeat every 600 days.

Now, within these 600 days, I need to find the longest sequence of days where none of the companies has a flight.

To do this, I need to find the days on which at least one company has a flight and then find the gaps between these days.

First, let's find all the days within 0 to 600 where at least one company has a flight.

Company 1: multiples of 100: 0, 100, 200, 300, 400, 500, 600

Company 2: multiples of 120: 0, 120, 240, 360, 480, 600

Company 3: multiples of 150: 0, 150, 300, 450, 600

Now, list all unique flight days:

0, 100, 120, 150, 200, 240, 300, 360, 400, 450, 480, 500, 600

Now, I need to find the longest sequence of consecutive days that are not in this list.

To find the gaps, I can look at the differences between consecutive flight days.

Let's list the flight days in order:

0, 100, 120, 150, 200, 240, 300, 360, 400, 450, 480, 500, 600

Now, consider the gaps between these days:

- Between 0 and 100: 99 days without a flight (days 1 to 99)

- Between 100 and 120: 19 days without a flight (days 101 to 119)

- Between 120 and 150: 29 days without a flight (days 121 to 149)

- Between 150 and 200: 49 days without a flight (days 151 to 199)

- Between 200 and 240: 39 days without a flight (days 201 to 239)

- Between 240 and 300: 59 days without a flight (days 241 to 299)

- Between 300 and 360: 59 days without a flight (days 301 to 359)

- Between 360 and 400: 39 days without a flight (days 361 to 399)

- Between 400 and 450: 49 days without a flight (days 401 to 449)

- Between 450 and 480: 29 days without a flight (days 451 to 479)

- Between 480 and 500: 19 days without a flight (days 481 to 499)

- Between 500 and 600: 99 days without a flight (days 501 to 599)

So, the longest gap here is 99 days.

But wait, is there a possibility of a longer gap depending on how the schedules are aligned?

In the above calculation, I considered the flight days based on the LCM period, assuming that the companies start their schedules on day 0. However, the problem states "regardless of the departure times of the various airlines." This means that we need to consider the worst-case scenario, where the schedules are aligned in such a way that maximizes the gap between flights.

So, perhaps the initial approach needs adjustment.

Let's think about it differently. If the companies can start their schedules at any day, we need to find the maximum number d such that there exists a sequence of d consecutive days where none of the companies has a flight, no matter how they schedule their starting days.

This sounds like finding the largest integer d such that in any sequence of d consecutive days, there is at least one day that is a multiple of 100, 120, or 150.

Wait, no. The problem is to find the greatest d for which there will be d consecutive days without a flight, regardless of the starting times.

So, I need to find the largest d such that there exists a sequence of d consecutive days where none of the days is a multiple of 100, 120, or 150, no matter how the companies align their schedules.

In other words, even if the companies choose their starting days to minimize overlap, there should still be a sequence of d consecutive days without any flights.

This seems related to the concept of the least common multiple and the inclusion-exclusion principle.

Perhaps I need to find the maximum number d such that in any d consecutive days, there is at least one day that is a multiple of 100, 120, or 150.

Wait, that's the opposite of what's being asked.

The problem is to find the greatest d such that there exists a sequence of d consecutive days without any flight, regardless of when the companies start their schedules.

So, I need to find the largest possible d where, no matter when the companies start, there is always a sequence of d consecutive days without any flight.

But that seems contradictory. If the companies can start at any day, couldn't they cover all days with their flights?

Wait, perhaps I need to find the maximum d such that there exists a d-day period without any flight for some alignment of the schedules.

Alternatively, maybe I need to find the maximum d such that in some alignment of the schedules, there is a d-day period without any flight.

This is getting confusing. Let me look for a different approach.

I recall that in scheduling problems, the maximum gap between events can be found using the concept of the least common multiple and the greatest common divisor.

Alternatively, perhaps I can model this as a problem of covering the days with the flight schedules and finding the largest gap not covered.

Given that the periods are 100, 120, and 150 days, I can think of them as three arithmetic sequences:

- Company 1: 100k for integer k >=0

- Company 2: 120m for integer m >=0

- Company 3: 150n for integer n >=0

I need to find the largest integer d such that there exists a sequence of d consecutive days where none of these sequences contain any of those days.

This seems like finding the largest integer d such that there exists some integer a where none of the days a, a+1, ..., a+d-1 are multiples of 100, 120, or 150.

In other words, I need to find the largest d where there exists an a such that for all i in 0 to d-1, a+i is not a multiple of 100, 120, or 150.

This is equivalent to finding the largest d such that there is some a where none of a, a+1, ..., a+d-1 are congruent to 0 modulo 100, 120, or 150.

Wait, but 100, 120, and 150 are not pairwise coprime, so their modular systems are interconnected.

This is getting complicated. Maybe I should consider the least common multiple again.

Since the LCM of 100, 120, and 150 is 600, as calculated earlier, the schedules repeat every 600 days.

Within these 600 days, I can analyze the coverage of flight days.

From the earlier list, the flight days within 0 to 600 are:

0, 100, 120, 150, 200, 240, 300, 360, 400, 450, 480, 500, 600

Now, to find the largest gap between these days.

Looking at the differences:

- 0 to 100: 100 days → 99 days without a flight between 1 and 99

- 100 to 120: 20 days → 19 days without a flight between 101 and 119

- 120 to 150: 30 days → 29 days without a flight between 121 and 149

- 150 to 200: 50 days → 49 days without a flight between 151 and 199

- 200 to 240: 40 days → 39 days without a flight between 201 and 239

- 240 to 300: 60 days → 59 days without a flight between 241 and 299

- 300 to 360: 60 days → 59 days without a flight between 301 and 359

- 360 to 400: 40 days → 39 days without a flight between 361 and 399

- 400 to 450: 50 days → 49 days without a flight between 401 and 449

- 450 to 480: 30 days → 29 days without a flight between 451 and 479

- 480 to 500: 20 days → 19 days without a flight between 481 and 499

- 500 to 600: 100 days → 99 days without a flight between 501 and 599

So, the maximum gap within one cycle is 99 days.

But is this the greatest possible d? Maybe there are alignments where the gap is larger.

Wait, perhaps I need to consider that the companies can start their schedules on different days, not necessarily on day 0.

In that case, the starting points of their cycles can be shifted, which might create larger gaps without flights.

Let me consider that.

Suppose company 1 starts on day s1, company 2 on day s2, and company 3 on day s3.

Then, their flight days are:

- Company 1: s1, s1 + 100, s1 + 200, ...

- Company 2: s2, s2 + 120, s2 + 240, ...

- Company 3: s3, s3 + 150, s3 + 300, ...

Now, to find the largest d such that there exists a sequence of d consecutive days where none of these sequences contain any of those days, regardless of s1, s2, and s3.

This seems complicated. Maybe I can think in terms of the minimal coverage.

I need to find the maximum d where, no matter how s1, s2, and s3 are chosen, there is always a sequence of d consecutive days without any flight.

Alternatively, it's the minimal d where there is no such sequence, but I need to find the maximal d where such a sequence exists.

This is a bit confusing. Perhaps I should think about the maximal d where there exists at least one sequence of d consecutive days without flights, for any choice of s1, s2, and s3.

Alternatively, perhaps I should consider the concept of the minimal number of days covered by the flights, and find the maximal gap.

Wait, maybe I should consider the problem in terms of the maximal d where, in the worst-case alignment of s1, s2, and s3, there is still a sequence of d consecutive days without flights.

This seems similar to finding the minimal number of days covered by the flights, and then finding the maximal gap.

Alternatively, perhaps I can think in terms of the densities of the flight days.

The density of flight days for each company is:

- Company 1: 1/100

- Company 2: 1/120

- Company 3: 1/150

But since their schedules can overlap, the total density is not just the sum.

I need to find the combined density of days with at least one flight.

Using the inclusion-exclusion principle:

Density = 1/100 + 1/120 + 1/150 - 1/LCM(100,120) - 1/LCM(100,150) - 1/LCM(120,150) + 1/LCM(100,120,150)

We already know LCM(100,120,150) = 600

Now, LCM(100,120):

100 = 2^2 * 5^2

120 = 2^3 * 3 * 5

LCM = 2^3 * 3 * 5^2 = 120 * 5 = 600

Similarly, LCM(100,150) = 300

LCM(120,150) = 600

So, density = 1/100 + 1/120 + 1/150 - 1/600 - 1/300 - 1/600 + 1/600

Let's compute this:

1/100 = 0.01

1/120 ≈ 0.008333

1/150 ≈ 0.006667

1/600 ≈ 0.001667

1/300 ≈ 0.003333

So, density ≈ 0.01 + 0.008333 + 0.006667 - 0.001667 - 0.003333 - 0.001667 + 0.001667

Calculating step by step:

0.01 + 0.008333 = 0.018333

0.018333 + 0.006667 = 0.025

Then, subtracting:

0.025 - 0.001667 ≈ 0.023333

0.023333 - 0.003333 = 0.02

0.02 - 0.001667 ≈ 0.018333

Then adding back:

0.018333 + 0.001667 = 0.02

So, the density of flight days is approximately 0.02, or 2% of the days have flights.

Therefore, 98% of the days do not have flights.

But the problem is to find the longest sequence of consecutive days without flights, regardless of the starting times.

I think I need to approach this differently.

Perhaps I can consider the maximal gap between flight days.

In statistics, the maximal gap in a uniform distribution can be modeled, but I'm not sure if that's applicable here.

Alternatively, maybe I can think in terms of the greatest common divisor (GCD) of the periods.

The GCD of 100, 120, and 150.

Let's find GCD(100,120,150).

First, find GCD(100,120):

100: 2^2 * 5^2

120: 2^3 * 3 * 5

GCD is 2^2 * 5 = 20

Now, GCD(20,150):

150: 2 * 3 * 5^2

GCD(20,150) = 2 * 5 = 10

So, the GCD is 10.

This means that the flight days are all multiples of 10.

But I'm not sure how this helps directly.

Wait, maybe I can think about the problem in terms of the greatest hole between flight days.

In other words, the maximal number of consecutive days without a flight.

Given that the periods are 100, 120, and 150, and their GCD is 10, perhaps the maximal gap is related to this.

But I need a systematic way to find this.

Let me consider the following approach:

Imagine the days are arranged in a circle every 600 days, since that's the LCM.

In this 600-day cycle, there are flight days at:

0, 100, 120, 150, 200, 240, 300, 360, 400, 450, 480, 500, 600

Now, to find the maximal gap, I need to look at the differences between consecutive flight days, considering the cycle wraps around.

So, the differences are:

- 100 - 0 = 100

- 120 - 100 = 20

- 150 - 120 = 30

- 200 - 150 = 50

- 240 - 200 = 40

- 300 - 240 = 60

- 360 - 300 = 60

- 400 - 360 = 40

- 450 - 400 = 50

- 480 - 450 = 30

- 500 - 480 = 20

- 600 - 500 = 100

- 0 - 600 = 0 (since it wraps around)

So, the maximal gap is 100 days.

But earlier, I saw that within these gaps, the number of days without a flight is 99 days (between 0 and 100, for example).

Wait, why is that?

Because day 0 has a flight, day 100 has a flight, so days 1 through 99 do not have a flight.

Similarly, between 100 and 120, it's 19 days without a flight, and so on.

So, the maximal gap without a flight is 99 days.

But is this the greatest possible d, regardless of how the companies align their schedules?

I think that 99 is the maximal possible gap without a flight, given the LCM is 600 and the flights occur every 100, 120, and 150 days.

But perhaps with different alignments, larger gaps can occur.

Wait, suppose all companies start their flights on day 0.

Then, flights occur on days that are multiples of 100, 120, or 150.

As we saw earlier, the maximal gap is 99 days.

But if the companies align their schedules differently, maybe the gap can be larger.

For example, suppose company 1 starts on day 1, company 2 on day 2, and company 3 on day 3.

Then, their flight days are:

- Company 1: 1, 101, 201, ...

- Company 2: 2, 122, 242, ...

- Company 3: 3, 153, 303, ...

Now, the first flight days are day 1, day 2, and day 3.

So, day 0 has no flight, day 1 has a flight, day 2 has a flight, day 3 has a flight, day 4 has no flight, and so on.

In this case, the gaps between flight days can be larger.

For example, between day 3 and day 101, there are 97 days without a flight (days 4 to 100).

Wait, but company 1 has a flight on day 101, company 2 on day 122, etc.

So, between day 3 and day 101, there are 97 days without a flight.

Similarly, between day 101 and day 122, there are 20 days without a flight, and so on.

So, in this alignment, the maximal gap is 97 days.

But earlier, with all companies starting on day 0, the maximal gap was 99 days.

So, in this case, the gap is smaller.

Perhaps the maximal d is achieved when the companies start their schedules in such a way that the flight days are as spread out as possible.

But I need to find the greatest d such that there will always be d consecutive days without a flight, regardless of how the companies align their schedules.

In other words, no matter how the companies choose their starting days, there will always be at least one sequence of d consecutive days without any flight.

This seems like finding the minimal coverage of flight days across all possible alignments.

Alternatively, it's the maximal d where, in every possible alignment of the schedules, there exists at least one sequence of d consecutive days without any flight.

This is equivalent to finding the largest d such that, for any choice of s1, s2, and s3, there exists a sequence of d consecutive days where none of the companies has a flight.

This sounds like finding the largest d where the complement of the union of the flight schedules always contains at least one sequence of d consecutive days.

This seems complicated.

Maybe I can think about it in terms of the minimal number of days covered by the flights across all alignments.

Then, the maximal d would be the total number of days minus the minimal number of days covered by the flights.

But I'm not sure.

Alternatively, perhaps I can model this using the concept of the minimal period where the flight schedules repeat, and then find the maximal gap within that period.

Given that the LCM is 600, the schedules repeat every 600 days.

Within these 600 days, the flight days are as listed earlier.

Now, to find the maximal gap, I need to find the largest number of consecutive days without a flight.

From the earlier list, the maximal gap is 99 days, between day 0 and day 100, and between day 500 and day 600.

But perhaps there are alignments where the gap is larger.

Wait, suppose company 1 starts on day 99, company 2 on day 119, and company 3 on day 149.

Then, their flight days are:

- Company 1: 99, 199, 299, ...

- Company 2: 119, 239, 359, ...

- Company 3: 149, 300, 450, ...

Now, between day 0 and day 99, there are no flights because company 1 starts on day 99, but day 99 has a flight.

So, days 1 to 98 have no flight.

Similarly, between company 1's day 99 and company 2's day 119, there are 20 days without a flight (days 100 to 119).

Between company 2's day 119 and company 3's day 149, there are 30 days without a flight (days 120 to 149).

Between company 3's day 149 and company 1's day 199, there are 50 days without a flight (days 150 to 199).

And so on.

So, in this alignment, the maximal gap is 50 days.

This is less than the 99 days in the earlier alignment.

Therefore, it seems that 99 is still the maximal possible gap.

But I need to confirm this.

Alternatively, perhaps I can consider the problem in terms of the frobenius coin problem, where the largest number that cannot be expressed as a combination of the given periods.

But I'm not sure if that's applicable here.

Wait, perhaps I can think about the problem in terms of the greatest integer d such that d is less than the least common multiple of the periods minus the sum of the periods plus the greatest common divisor.

But that seems too vague.

Alternatively, perhaps I can consider the problem as finding the largest integer d such that d is less than the least common multiple of the periods divided by the sum of the reciprocals of the periods.

But that doesn't seem right either.

Maybe I need to look for a different approach.

Let me consider the following theorem: in a system of periodic events with periods p1, p2, ..., pn, the maximal gap between events is equal to the least common multiple of the periods minus the greatest common divisor of the periods.

But I'm not sure if this is a valid theorem.

Wait, perhaps I can consider that the maximal gap is LCM - GCD.

In this case, LCM is 600 and GCD is 10, so maximal gap would be 600 - 10 = 590.

But that seems too large, as earlier calculations suggest the maximal gap is only 99 days.

So, that can't be right.

Alternatively, perhaps the maximal gap is LCM - min(periods).

Here, min(periods) is 100, so maximal gap would be 600 - 100 = 500.

But again, this is larger than the 99 days observed earlier.

So, that doesn't seem correct.

Maybe I need to think differently.

Let me consider the problem as finding the largest integer d such that d is less than the minimal period minus one.

But that doesn't make sense because there are three different periods.

Alternatively, perhaps d is the minimal period minus one, which would be 100 - 1 = 99 days.

This matches the earlier observation.

But I need to confirm if this is indeed the greatest possible d.

Wait, perhaps d is equal to the smallest period minus one.

In this case, the smallest period is 100 days, so d = 100 - 1 = 99 days.

But I need to verify if this holds true for any alignment of the schedules.

Suppose company 1 starts on day 0, company 2 on day 0, and company 3 on day 0.

Then, flight days are on days that are multiples of 100, 120, or 150.

As seen earlier, the maximal gap is 99 days.

Now, suppose company 1 starts on day 99, company 2 on day 119, and company 3 on day 149.

Then, flight days are:

- Company 1: 99, 199, 299, ...

- Company 2: 119, 239, 359, ...

- Company 3: 149, 300, 450, ...

In this case, between day 0 and day 99, days 1 to 98 have no flight.

Between day 99 and day 119, days 100 to 118 have no flight.

Between day 119 and day 149, days 120 to 148 have no flight.

Between day 149 and day 199, days 150 to 198 have no flight.

And so on.

In this alignment, the maximal gap is still 99 days.

Similarly, in other alignments, it's hard to see how the maximal gap could be larger than 99 days.

Therefore, it seems that d = 99 is the greatest possible number of consecutive days without a flight, regardless of how the companies align their schedules.

But to be thorough, let's consider another alignment.

Suppose company 1 starts on day 1, company 2 on day 2, and company 3 on day 3.

Then, flight days are:

- Company 1: 1, 101, 201, ...

- Company 2: 2, 122, 242, ...

- Company 3: 3, 153, 303, ...

In this case, day 0 has no flight, day 1 has a flight, day 2 has a flight, day 3 has a flight, day 4 has no flight, and so on.

Between day 3 and day 101, there are 97 days without a flight (days 4 to 100).

Between day 101 and day 122, there are 20 days without a flight (days 102 to 121).

Between day 122 and day 153, there are 31 days without a flight (days 123 to 153).

Wait, day 153 is a flight day for company 3.

So, between day 122 and day 153, it's days 123 to 152, which is 29 days without a flight.

Wait, perhaps I miscalculated.

Company 3 has a flight on day 3, day 153, day 303, etc.

Company 1 has a flight on day 1, day 101, day 201, etc.

Company 2 has a flight on day 2, day 122, day 242, etc.

So, between day 1 and day 2, day 0 has no flight, day 1 has a flight, day 2 has a flight.

Between day 2 and day 3, day 3 has a flight.

Between day 3 and day 101, days 4 to 100 have no flight.

Between day 101 and day 122, days 102 to 121 have no flight.

Between day 122 and day 153, days 123 to 152 have no flight.

So, the maximal gap in this alignment is 97 days (days 4 to 100).

Still less than 99 days.

Therefore, it seems that 99 is indeed the maximal possible d.

Hence, the greatest positive integer d for which there will be d consecutive days without a flight from Dodola island, regardless of the departure times of the various airlines, is 99.

Now, the problem also asks for the remainder when this d is divided by 1000.

Since d = 99, and 99 divided by 1000 is 0 with a remainder of 99, the remainder is 99.

Therefore, the final answer is 99.

**Final Answer**

\[ \boxed{99} \]

 

너무 길다. 여기서도 보면 알 수 있듯이 자신의 생성결과물을 계속 의심하고(Wait), 검증하고(But), 다시 질문한다.

이 문항에 대한 답은 틀렸지만, 다른 예시 문항은 많이 맞췄다.

 

이제 AIMO2  전체 레퍼런스 문항에 대해서, 실험한 결과를 보자.

 

실험 결과

먼저 생성에 사용할 Sampling 파라미터는 다음과 같다.

  • temperature : 0.9
  • min_p : 0.01
  • top_p : 0.8,
  • max_tokens : 32,768

생성전략은 몇가지 다양한 System Prompt(5개)로, maj@5를 정답으로 결정했다.

 

레퍼런스 10문제 중 3문제 맞췄다.(3/10)

또한, 별개로 생성된 답안 중 실제 정답을 포함하고 있는 문제도 꽤 있었다. 

 

그럼 LB 스코어는 어떨까?

 

다양한 Sampling 파라미터와 시간제한을 고려해서, 몇가지 System Prompt를 변경하거나, 추가해서 제출하며 실험을 진행했다.

10~20점사이로 점수가 분포되어있었고, 극단의 값은 10점이었다. 대부분은 15~18점 정도로 나왔고, 20점이 최고기록이었다. 결과를 보면 알 수 있듯이, Randomness가 크다. Sampling 파라미터를 극단적으로 바꾼것은 없었거니와, 생성과정에서 복잡한 것도 없었다.

 

하지만, 기존 모델보다 확실하게 뛰어난 성능을 보여줫고, 이 모델의 학습 방법을 활용한 방식의 잠재력이 기대된다.

 

느낀점

1. Context Length는 32,768로 되어있지만, 흠..?

생성과정이 아주 길다. 따라서, 지원하는 최대 컨텍스트 길이도 엄청 길고, 최종 답안 생성까지도 아주 많은 토큰을 생성한다.

하지만, 생성하는 과정의 길이가 길어짐에 따라, 외계어 출력이 꽤 나온다. 그나마 이걸 방지하기 위해서, min_p값을 조금 높여주면, 잘 나오지는 않는 것 같았다. 

2. 끝없는 반복

문제를 해결하기 위해, 생각하는 과정(inner-thought)의 내용이 계속 반복된다. 이건, Qwen측에서도 공지되어 있는 부분이었다.

3. 틀린 과정, 정답 / 정확한 과정, 오답

할루시네이션? 이라고 봐야할까. 틀린 과정인데 정답이 나오는 경우와 정확한 과정인데 오답이 나오는 경우가 있다.

역시나 아쉬운건, 후자인 경우... 어떻게 해결할 수 있을지 고민해봐야할 문제다.


학습 방법, 데이터셋과 같은 공개되어 있는 자료가 많이 부족해서 내가 했던 실험 위주로 기록할 수 밖에 없는 것 같다.

아직 추가 학습을 통한 개선 방법을 적용하지 못했다.(GPU... 비싸서 확실하게 전략이 정해진다면, 해야할듯..)

추론 과정에서 개선할 수 있는 방법을 열심히 찾아보고 있고, 성능이 좋은 모델을 최대한 활용하면서 따라가보려고 한다. 어느 정도 성능의 변화가 없는 지금 같은 상황에서, 추가 학습으로 성능을 더 이끌어내기에 적합한 시기인 것 같다.

다음 아티클은 test-time compute scaling 방법론을 적용한 내용을 기록할 예정이다.

 

날이 갑자기 많이 추워지기도 하고, 요즘 독감이 유행이라고 하니 건강관리에 유의해야할 것 같다.

다양한 사람, 다양한 장소에서 배움을 지향합니다.
최근 당신이 흥미롭게 보고계시는 주제를 저도 알고싶습니다!
또, 제가 작성한 글이 도움이 되셨거나, 의견과 느낀점을 편하게 댓글로 남겨주세요! 🤗