[Python] Word

Notice

Recent Posts

Recent Comments

Link

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Tags more

Archives

Today

Total

관리 메뉴

데이터 공부기록

[Python] Word_Count 본문

카테고리 없음

[Python] Word_Count

standingR 2024. 3. 28. 12:55

1. 기본 설명

가장 기본적인 Dict를 가지고 활용하는 예 : 워드카운트
=> 어떤 단어가 몇 번 나타났느냐
단어 & 출현빈도 (연결/쌍)
단) 내가 수집한/처리할 긴 텍스트들 중에서 쭉 스캔하면서.....
해당 단어가 나올 떄마다.... 카운팅!!!

ex)

songs = "The snow glows white on the mountain tonight Not a footprint to be seen A kingdom of isolation And it looks like I'm the queen"
songs

주어진 긴 텍스트 (문자열)을 기준 : 단어
    ++ 엄일하게 처리하기 위해서는 : 문장에 대한 품사 처리
단어 중심 (띄어쓰기 중심으로 간단히)

songs.split()

['The',
 'snow',
 'glows',
 'white',
 'on',
 'the',
 'mountain',
 'tonight',
 'Not',
 'a',
 'footprint',
 'to',
 'be',
 'seen',
 'A',
 'kingdom',
 'of',
 'isolation',
 'And',
 'it',
 'looks',
 'like',
 "I'm",
 'the',
 'queen']

- 실제 TO DO

TO DO : 위의 단어 리스트들을 돌아가면서 : for
=> 단어별로 몇번 나타나는지 카운팅
case1) 신규 등록된 단어 : 단어 키, 값 1
case2) 이미 나왔던 단어 : 단어 키 조회 -> 값 +1 갱신

words = songs.split(" ")
# 내가 words 입력에 대해서 처리할 양식!
counts = {} # 키 : 단어 , 값 : 빈도
# => 수집된 단어들을 돌아가면서 
for word in words: # 복합어 처리 ....
    # 1) 대소문자에 대한 처리 : 소문자로 통일
    word = word.lower() # The => the
    # 2) 입력 단어들에 대한 빈도 차리
    # ==> 기 등록된 여부 : key 있는지 ---> in
    # (in 리스트 : 값이 있냐 (or)), in dic : 카가 있냐)
    
    if word in counts: # True ---> 기 등록된 단어 : +1 갱신
        counts[word] += 1
    else : # 처음 등록돤 단어 : 신규 단어
        counts[word] = 1
counts

출력 결과 

{'the': 3,
 'snow': 1,
 'glows': 1,
 'white': 1,
 'on': 1,
 'mountain': 1,
 'tonight': 1,
 'not': 1,
 'a': 2,
 'footprint': 1,
 'to': 1,
 'be': 1,
 'seen': 1,
 'kingdom': 1,
 'of': 1,
 'isolation': 1,
 'and': 1,
 'it': 1,
 'looks': 1,
 'like': 1,
 "i'm": 1,
 'queen': 1}

데이터 공부기록

[Python] Word_Count 본문

[Python] Word_Count

1. 기본 설명

- 실제 TO DO

티스토리툴바