backtestingtrading strategychart analysistrading journal

How to Backtest a Discretionary Chart Strategy Without Overfitting

April 6, 2026Bullsights Learning

You have a setup that "always works" on last year's charts.

You trade it live. You take three losses in a row. You tell yourself the market changed. Maybe it did. More often, you never had a real edge. You had a story that fit the past too well.

That is overfitting in plain clothes. Not a quant lab problem. A discretionary trader problem when you cherry-pick examples, tweak rules after every loss, and call it a backtest.

This guide shows how to backtest a trading strategy you actually trade from charts: manual replay with screenshots, honest sample sizes, a forward test that punishes curve fitting, journal tags that keep data clean, and the traps that make backtests lie.

For the full screenshot workflow (entry, stop, targets), start with our pillar guide on how to analyze a trading chart screenshot with AI.

Who this is for

This fits you if:

You trade discretionary setups (breakouts, retests, structure breaks) from TradingView or your broker.
You have rules in your head but no written log of how they performed.
You tried indicator-heavy backtests and they did not match how you actually click.
You want a backtest trading strategy process you can run in a weekend without coding.

If you want a Python bot that optimizes 40 parameters, this is not that article. If you want to know whether your chart playbook survives contact with history, keep reading.

Why screenshot backtests beat fantasy stats

Most retail "backtests" are slideshows. You scroll old charts, circle winners, and skip the ugly sessions. Your brain fills in perfect entries you never would have taken live.

A real discretionary backtest needs three things algo tests hide:

Frozen context. What did the chart look like at decision time, not after the move?
Written rules before the outcome. Entry, stop, target, and invalidation in words, not vibes.
A log you cannot edit retroactively. Date, tag, screenshot file name, result in R.

Screenshots solve (1). A journal template solves (2) and (3). That is why this method pairs with a trading journal built on chart screenshots.

Live API backtests are useful for systematic systems. For chart traders, the gap between "historical fill model" and "I hesitated two ticks" is huge. Manual replay closes that gap slowly but honestly.

Define the strategy in one page (before you scroll history)

You cannot backtest fog. Write a one-page playbook before you open old data.

Include:

Markets and sessions you trade (e.g. US index futures RTH only)
Timeframes for bias vs entry (e.g. 1H context, 5m trigger)
Setup definition in one sentence (e.g. "Break and retest long after BOS in uptrend")
Entry trigger (close above level, limit at retest, etc.)
Stop rule (below which swing, with buffer or not)
Target rule (fixed R, structure high, partials)
Filters (no trade first 15 minutes, no trade into major news, etc.)
Hard skips (chop, no clear structure, conflicting higher timeframe)

If you cannot explain invalidation on a screenshot, you do not have a strategy. You have a mood.

Structure labeling matters here. If your rules use BOS or CHoCH, align definitions with our market structure trading guide so tags stay consistent across hundreds of samples.

Manual backtest workflow (screenshot replay)

This is the core loop. Repeat it the same way every sample.

Step 1: Pick a historical window

Choose a block of time you did not trade live (or use a symbol you rarely traded). Six to twelve months is a common starting point for intraday setups. Longer is better if you can stay focused.

Split mentally into:

In-sample period: where you develop and count the edge (e.g. months 1 to 8)
Holdout period: untouched until the end (e.g. months 9 to 12)

Touching the holdout early is how discretionary traders lie to themselves. Guard it like a separate exam.

Step 2: Walk forward bar by bar (or session by session)

Open charts without indicators you did not use at the time. Scroll one session at a time. At each potential setup:

Pause at the decision candle (before outcome is known).
Screenshot or save chart state as it would have looked live.
Write plan fields: bias, entry, stop, target, setup tag.
Advance price and record outcome in R (risk units), not dollar P and L.

R normalizes winners and losers. A +2R win and a -1R loss mean something comparable across size changes. Dollar columns tempt you to judge strategy by luck on one outsized position.

Step 3: Score only trades that pass filters

If your rule says "no trade in chop," and you take it anyway in replay because you know it worked, you contaminated the sample. No tag, no trade. Skipped charts still get a note: "skipped: no BOS" counts as discipline data.

Step 4: Store files like a lab

Folder pattern:

Backtest / [strategy-name] / in-sample / YYYY-MM-DD_SYMBOL_TF_tag.jpg

Plus a CSV or doc row: date, symbol, tag, entry, stop, target, result R, notes.

Desk with printed chart screenshots, labeled folders, and a backtest score sheet for discretionary trades

Step 5: Summarize with boring math

After in-sample pass, compute:

Sample size (N): number of tagged trades that met rules
Win rate: wins divided by N
Average win (R): mean R on winners
Average loss (R): mean absolute R on losers (usually 1R if stops are honest)
Expectancy per trade: (win rate × avg win) minus (loss rate × avg loss)

You do not need a platform. A spreadsheet is enough. Expectancy is preview for the companion article on trading expectancy and risk of ruin. Here, use it as a gate: positive expectancy on in-sample data is necessary, not sufficient.

Sample size rules (when your backtest is lying)

Small N is the silent killer. Twenty amazing trades prove nothing except you can find twenty examples.

Use these minimum guidelines for discretionary chart strategies:

Situation	Minimum tagged trades (rule-following)
Learning if setup appears often enough	30 to 50
Rough expectancy estimate	50 to 100
Comfort before size increase	100+ on in-sample, then holdout confirm

If you only get eight samples in six months, your filters may be too tight or your definition is too vague. Both are findings. Tight filters with tiny N mean you cannot know edge yet. Vague definitions mean you are counting different trades under one name.

Rule of thumb: If changing one trade (removing the best winner) flips expectancy from positive to negative, your N is too small. Keep collecting.

Also track regime slices: trend days vs chop days, high vol vs low vol. A strategy that only works on trend days is valid if your pre-market filter skips chop. Tag those regimes explicitly using ideas from a pre-market trade plan checklist.

Forward test: the anti-overfitting phase

In-sample replay teaches what could have worked. Forward test teaches what still works when you pretend you do not know the close.

Protocol

Freeze the playbook after in-sample review. No new rules mid-test.
Run the holdout period with the same screenshot and logging steps. Do not peek at results until N reaches your minimum or the window ends.
Trade paper or minimum size live for the next 20 to 30 occurrences after holdout. Same tags, same screenshots, same R scoring.
Compare three numbers: in-sample expectancy, holdout expectancy, live forward expectancy.

Acceptable drift: small drop in win rate or average R because execution is harder live. Red flag: holdout negative while in-sample was stellar. That pattern screams curve fit or regime change.

Forward test length beats forward test intensity. Thirty small honest trades beat five perfect hero trades.

Journal tags that keep backtest data clean

Tags are how you turn screenshots into statistics. One sloppy tag ruins review.

Use one primary tag per trade, chosen from a fixed list you write in advance:

bos-retest-long / bos-retest-short
range-fade (only if your playbook allows it)
liquidity-sweep-reversal
no-trade-skip (for filtered setups you correctly avoided)

Add optional modifier tags (max two):

htf-aligned / htf-against
open-drive / midday-chop / close-rotation
news-day / no-news

Do not invent tags mid-backtest because one trade felt special. If a trade does not fit, your strategy note needs a revision after the test ends, not during.

Pair tags with the journal fields from the weekly review system: plan vs execution flag, exit reason, result in R. During backtest, execution is always "plan followed" by definition. During live forward test, "partial" and "no" rows become gold.

Common overfitting traps (discretionary edition)

Trap: rule shopping after every loss.
You lose twice, add a filter, win three times, declare victory. Fix: batch rule changes only at scheduled review (e.g. every 50 samples).

Trap: counting almost-trades as wins.
Price "almost" hit target but you would have moved stop. Fix: score the plan you wrote at entry, not the story after.

Trap: mixing timeframes without tagging.
5m entry with 1H bias sometimes, sometimes not. Fix: tag htf-aligned or split into two strategies.

Trap: invisible parameters.
"Strong momentum" is not a parameter. "Close in top third of range after BOS" is closer. If it cannot be checked on a screenshot, it is not a rule.

Trap: optimizing stop tightness to maximize win rate.
Tight stops inflate wins until one normal wick erases a month. Fix: stops from structure, sized with a 1% risk workflow.

Trap: ignoring costs.
Spread, commission, and slippage matter on scalps. Subtract a conservative fraction of R on small targets.

Trap: one heroic symbol.
Edge on NASDAQ only does not prove edge on EURUSD. Tag symbol clusters or run separate backtests.

Trap: skipping multi-timeframe conflict.
If your playbook uses higher timeframe bias, log it. See multi-timeframe analysis workflow for consistent context captures.

How AI chart analysis fits (without cheating)

AI tools can speed up drafting entry, stop, and target levels from a screenshot. They do not replace replay discipline.

Safe uses during backtest:

Generate a structured plan from a frozen screenshot faster than manual typing
Compare your marked structure to AI labels as a second opinion
Batch-review whether stops align with swing logic

Unsafe uses:

Running AI on after-the-fact charts and pretending that was your live view
Adding indicators AI suggested that were not in your frozen playbook
Letting AI "find" setups you would not have noticed without hints

Bullsights is built for structured plans from screenshots, not for rewriting history. Use it on the image at decision time, then score the plan you would have taken.

Pre-backtest and post-backtest checklist

Before you start

One-page playbook written (setup, entry, stop, target, filters)
Fixed tag list (primary + modifiers)
In-sample and holdout dates chosen
Folder structure and log template ready
Minimum N target set (50+ for expectancy claims)

During in-sample

Screenshot at decision time for every tagged trade
Skipped setups logged with reason
Results recorded in R only
No rule changes until scheduled review

After in-sample, before live size

Holdout replay complete at same N standard
Expectancy positive in-sample and holdout (or you stop)
Worst losing streak noted in R (drawdown psychology)
Forward test plan: paper or min size, 20 to 30 trades
Position sizing rules linked to equity, not conviction

Two failed gates (tiny N or negative holdout) means no size increase. It means more samples or a simpler playbook.

FAQ

Can I backtest without TradingView replay mode?

Yes. Any chart platform that lets you scroll historical bars session by session works. The tool matters less than freezing the decision-time view and logging R.

How long should a manual backtest take?

For 50 to 100 samples, expect several focused sessions across a week, not one tired night. Quality beats speed.

What win rate is "good" for a discretionary strategy?

Win rate alone is meaningless without average R. A 40% win rate with +2R average winners can beat 60% win rate with +0.8R winners. Track expectancy, not bragging rights.

Should I include breakeven trades as wins?

No. Score breakeven as 0R or a small negative if you paid spread. Be consistent.

When do I know the strategy is "ready" for real size?

When holdout and forward test show positive expectancy at your minimum N, and you followed the playbook on live forward trades without silent rule changes. Then use position sizing from your chart stop, not excitement.

Does Bullsights run backtests for me?

No. Bullsights turns chart screenshots into structured trade plans (entries, stops, targets, scenarios, macro context) using specialized analysis. You still own replay, tagging, and honest scoring.

Bottom line

A backtest trading strategy for chart traders is not a magic number from a website. It is a pile of decision-time screenshots, clean tags, enough samples to trust the math, and a forward test that refuses your best excuses.

Write the playbook first. Replay in-sample with R scoring. Respect minimum N. Protect a holdout window. Forward test small before you size up. Kill curve-fit rules at scheduled review, not after every red day.

When you want structured entry, stop, and target drafts from the same screenshots you use in replay, try Bullsights. Upload the chart at decision time. Log the plan. Let the backtest judge the idea, not your memory.