skills/domains/finance/financial-data-analysis/SKILL.md
Methods for acquiring, cleaning, and analyzing financial datasets for research
npx skillsauth add wentorai/research-plugins financial-data-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A practical skill for sourcing, processing, and analyzing financial data in academic research contexts. Covers data acquisition from public APIs, cleaning workflows, and standard analytical techniques used in empirical finance research.
| Source | Data Type | Access | Python Package |
|--------|-----------|--------|---------------|
| Yahoo Finance | Prices, fundamentals | Free | yfinance |
| FRED (St. Louis Fed) | Macroeconomic indicators | Free (API key) | fredapi |
| SEC EDGAR | Company filings (10-K, 10-Q) | Free | sec-edgar-downloader |
| WRDS (Wharton) | CRSP, Compustat, IBES | University subscription | wrds |
| Alpha Vantage | Real-time and historical prices | Free tier | alpha_vantage |
import yfinance as yf
import pandas as pd
def fetch_stock_data(tickers: list[str], start: str, end: str) -> pd.DataFrame:
"""
Fetch adjusted close prices for a list of tickers.
Args:
tickers: List of ticker symbols (e.g., ['AAPL', 'MSFT'])
start: Start date (YYYY-MM-DD)
end: End date (YYYY-MM-DD)
Returns:
DataFrame with adjusted close prices
"""
data = yf.download(tickers, start=start, end=end, auto_adjust=True)
prices = data['Close'] if len(tickers) > 1 else data[['Close']]
prices.columns = tickers if len(tickers) > 1 else tickers
return prices
# Fetch 5 years of data
prices = fetch_stock_data(['AAPL', 'MSFT', 'GOOGL'], '2020-01-01', '2025-01-01')
print(prices.head())
from fredapi import Fred
fred = Fred(api_key=os.environ["FRED_API_KEY"])
# Common series for finance research
series_ids = {
'GDP': 'GDP',
'CPI': 'CPIAUCSL',
'Fed_Funds_Rate': 'FEDFUNDS',
'Unemployment': 'UNRATE',
'10Y_Treasury': 'DGS10',
'VIX': 'VIXCLS'
}
macro_data = pd.DataFrame()
for name, sid in series_ids.items():
macro_data[name] = fred.get_series(sid, observation_start='2000-01-01')
Financial data requires careful cleaning before analysis:
def clean_financial_data(df: pd.DataFrame) -> pd.DataFrame:
"""Standard cleaning pipeline for financial time series."""
cleaned = df.copy()
# 1. Handle missing values
missing_pct = cleaned.isnull().sum() / len(cleaned) * 100
print(f"Missing data:\n{missing_pct}")
# 2. Forward-fill for market holidays (max 5 days)
cleaned = cleaned.ffill(limit=5)
# 3. Remove remaining NaN rows
cleaned = cleaned.dropna()
# 4. Detect and flag outliers (>5 sigma daily returns)
returns = cleaned.pct_change()
z_scores = (returns - returns.mean()) / returns.std()
outliers = (z_scores.abs() > 5).any(axis=1)
print(f"Outlier days flagged: {outliers.sum()}")
# 5. Verify data integrity
assert cleaned.index.is_monotonic_increasing, "Index must be sorted"
assert not cleaned.duplicated().any(), "No duplicate rows allowed"
return cleaned
def compute_returns(prices: pd.DataFrame) -> dict:
"""Compute standard return metrics."""
simple_returns = prices.pct_change().dropna()
log_returns = np.log(prices / prices.shift(1)).dropna()
annualized_return = simple_returns.mean() * 252
annualized_vol = simple_returns.std() * np.sqrt(252)
sharpe_ratio = annualized_return / annualized_vol
# Maximum drawdown
cumulative = (1 + simple_returns).cumprod()
rolling_max = cumulative.cummax()
drawdown = (cumulative - rolling_max) / rolling_max
max_drawdown = drawdown.min()
return {
'annualized_return': annualized_return,
'annualized_volatility': annualized_vol,
'sharpe_ratio': sharpe_ratio,
'max_drawdown': max_drawdown
}
A common methodology in empirical finance research:
Always report both raw and risk-adjusted results, and perform robustness checks with different estimation windows and benchmark models.
Store all data processing steps in version-controlled scripts. Use pandas.DataFrame.to_parquet() for efficient storage of intermediate datasets, and document data provenance including download dates, API versions, and any filters applied.
documentation
Write Tsinghua University theses using the ThuThesis LaTeX template
development
Templates, formatting rules, and strategies for thesis and dissertation writing
documentation
Set up LaTeX templates for PhD and Master's thesis documents
documentation
Write SJTU theses using the SJTUThesis LaTeX template with full compliance