W przypadku bieżącego projektu planuję grupować dataframe pandas przez stock_symbol jako pierwszy kryterium i {x1}} jako drugi kryterium.

Od innych wątków widziałem, że struktura tacy jak group_data = df.groupby(['stock_symbol', 'quarter']) może być możliwym rozwiązaniem dla tego punktu. W danym przypadku otrzymuję tylko wyjście terminalowe <pandas.core.groupby.generic.DataFrameGroupBy object at 0x11fdcbf10>.

Czy ktoś znajdzie mój błąd myślenia dzięki tej linii? Odpowiedni sekcja kodu wygląda tak:

# Datetime conversion
df['date'] = pd.to_datetime(df['date'])
# Adding of 'Quarter' column
df['quarter'] = df['date'].dt.to_period('Q')
# Grouping both the Stock Symbol and the Quarter column
group_data = df.groupby(['stock_symbol', 'quarter'])
print(group_data)

Funkcja do wywołania w operacjach jest podświetlona poniżej:

# Word frequency analysis
def get_top_n_bigram(corpus, n=None):
    vec = CountVectorizer(ngram_range=(2, 2), stop_words='english').fit(corpus)
    bag_of_words = vec.transform(corpus)
    sum_words = bag_of_words.sum(axis=0)
    words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]
    words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)
    return words_freq[:n]

A odpowiednia datafram ma następującą strukturę:

[
{"gld_index": "1-0", "stock_symbol": "AMG", "gld_id": "7172", "date": "2013-01-01", "author_job_title": "Current Employee - Vice President", "author_location": "Prides Crossing, MA", "txt_main": "I have been working at Affiliated Managers Group full-time (More than 5 years)", "txt_pro": "AMG has built and continues to develop its position as a world-class asset management company. Working in this entrepreneurial culture enables smart, driven and focused individuals the opportunity to further the company’s mission to be a global leader in the asset management industry. If you are looking for an intellectually challenging workplace there are few competitive firms within the financial services industry that offer a comparable range of professional opportunities. As AMG grows and expands its footprint, there will continue to be new and exciting positions within the business for employees to advance their careers.", "txt_con": "Given the “campus” is located north of Boston, meeting up for drinks with friends and associates after work can be challenging.", "txt_adviceMgmt": null, "rating_recommend": 2, "rating_outlook": 2, "rating_ceo": 2, "scr_avg": 5.0, "scr_balance": 5.0, "scr_values": 5.0, "scr_opportunities": 5.0, "scr_benefits": 5.0, "scr_management": 5.0},
{"gld_index": "1-1", "stock_symbol": "AMG", "gld_id": "7172", "date": "2014-03-13", "author_job_title": "Former Contractor - Anonymous Contractor", "author_location": "Beverly, MA", "txt_main": "I worked at Affiliated Managers Group as a contractor (Less than a year)", "txt_pro": "No reason whatsoever to work at AMG as a temp from an employment agency especially if you are the only one from your agency - You will be depressed or go insane! They are temp/contractor unfriendly.\n\nIf you are from an audit firm, you probably don't have much choice but to go and represent your audit firm. In your case, the fact that you will most likely come as a group from your audit firm may help you maintain your sanity.\n\nThe good thing about being a part of an audit group @ AMG is that they will assign a room to your group with a large camera close-by to watch and listen to your conversations.\n\nAnother good thing is that they will provide a phone room (probably bugged with an eavesdropping device) to secretly record your personal conversations.\n\nFinally, IT just got smart by adding the word 'contractor' to the e-mail addresses of contractors or temps! This way, they are alerting everyone to filter you out from the several, daily secret e-mails!!", "txt_con": "Your employment agency will most likely paint the most beautiful picture of this gated 'castle' located somewhere in \"wealth land\" 600 Some Street in Beverly, whereby chefs cook magnificent free lunch for the employees, cozy gym in a mansion etc.....well, they probably don't know you are not allowed in the gym, you are not part of this free lunch program which is meant for the privileged full-time AMG staff. They will also forget to tell you about this secret culture @ AMG!!! Is it a cult?\n\nYou are confined to your desk or room (in case of a audit group). You are allowed to go to the bathroom assigned to your group, you can use the photocopy machine (Camera? Where?). If you are the type that likes to stretch your legs after sitting for long hours, be careful where you go and how often!! HR will probably just appear like a fairy in front of you from nowhere and ask if they can walk you around (knowing fully well that you have worked there for months and you know your way around your confinement!)\n\nWhat is all this secrecy about? It seems to be more than just protecting vital information (which is normal for any company and understandable considering the nature of their business). Is there more to it?", "txt_adviceMgmt": "Use some of your charitable contributions to provide food for contractors/ temps. You can begin charity from home! Food is way too cheap in America!!!....especially the type of nature-abundant leaves and flowers you serve for lunch.\n\nAlso, keep everyone healthy irrespective of employment status- Open the gym for contractors too!! They will work more efficiently (great ROI). Remember to put posters all over the gym to read \" shhhhh\". This way full-time employees will remember not to discuss those secrets!!!\n\nGive some incentives to those front desk girls....they'll stay longer!", "rating_recommend": 0, "rating_outlook": 1, "rating_ceo": 1, "scr_avg": 1.0, "scr_balance": 1.0, "scr_values": 1.0, "scr_opportunities": 1.0, "scr_benefits": 1.0, "scr_management": 1.0},
{"gld_index": "1-2", "stock_symbol": "AMG", "gld_id": "7172", "date": "2011-09-15", "author_job_title": "Former Employee - Anonymous Employee", "author_location": "Beverly, MA", "txt_main": "Smart, driven, risk-oriented people; intellectually challenging environment; innovator in its industry so there is always something new going on; long hours and stressful at times but very respectful of personal commitments- they strike the right balance; compensation is very good, benefits are phenomenal, and expectations about both are very clear; IT and HR departments are the best I've ever worked with.", "txt_pro": "Smart, driven, risk-oriented people; intellectually challenging environment; innovator in its industry so there is always something new going on; long hours and stressful at times but very respectful of personal commitments- they strike the right balance; compensation is very good, benefits are phenomenal, and expectations about both are very clear; IT and HR departments are the best I've ever worked with.", "txt_con": "The only downside to AMG is that because it is so successful, people don't leave very often, so there is very little upward mobility. It is also a relatively lean organization so there aren't many management levels (a good thing mostly). Thus, if you are a subject expert and are happy with a role that allows you to flourish in your subject area, then this is a great place to be. Similarly, if you want a job that you can leverage into a better opportunity down the road, this is a great stepping stone. However, if you are looking for a place to join and move around or \"climb the ladder,\" you will be frustrated.", "txt_adviceMgmt": null, "rating_recommend": 2, "rating_outlook": null, "rating_ceo": 2, "scr_avg": 4.0, "scr_balance": 5.0, "scr_values": null, "scr_opportunities": 4.0, "scr_benefits": 5.0, "scr_management": 4.5},
{"gld_index": "1-0", "stock_symbol": "MMM", "gld_id": "446", "date": "2017-05-14", "author_job_title": "Current Employee - Technical Aide", "author_location": "Maplewood, MN", "txt_main": "I have been working at 3M part-time (More than 3 years)", "txt_pro": "Respectful treatment, flexible hours, trainings and events, networking", "txt_con": "Not easy to move up, very competitive hiring process (150+ candidates for FT jobs)", "txt_adviceMgmt": null, "rating_recommend": 2, "rating_outlook": 1, "rating_ceo": 2, "scr_avg": 4.0, "scr_balance": 4.0, "scr_values": 5.0, "scr_opportunities": 3.0, "scr_benefits": 3.0, "scr_management": 4.0}
]
0
M. S. 18 lipiec 2020, 17:30

1 odpowiedź

Najlepsza odpowiedź

Oto jeden sposób, aby osiągnąć to, co jesteś po:

Funkcja niestandardowa:

def get_top_n_bigram(row):
    corpus = row['txt_main'] + row['txt_pro'] + row['txt_con'] + row['txt_adviceMgmt']
    n = 2 % the top n
    vec = CountVectorizer(ngram_range=(2, 2), stop_words='english').fit(corpus)
    bag_of_words = vec.transform(corpus)
    sum_words = bag_of_words.sum(axis=0)
    words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]
    words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)
    return words_freq[:n]

Zadzwoń do groupby za pomocą apply za pomocą zdefiniowanej funkcji:

df['date'] = pd.to_datetime(df['date'])
df['quarter'] = df['date'].dt.to_period('Q')
newdf = df.groupby(['stock_symbol', 'quarter']).apply(get_top_n_bigram).to_frame(name = 'frequencies')

print(newdf)
                                                  frequencies
stock_symbol quarter                                             
AMG          2011Q3         [(smart driven, 2), (driven risk, 2)]
             2013Q1   [(asset management, 2), (smart working, 1)]
             2014Q1     [(audit firm, 3), (employment agency, 2)]
MMM          2017Q2               [(working 3m, 1), (3m time, 1)]
2
siamak safari 18 lipiec 2020, 15:49