Simian detects duplicate code
across large codebases within seconds

Simian is a fast, language-agnostic similarity analyzer that scans
source files and reports duplicated blocks of code.

Get Started

Simian analyzes any human-readable file, from modern application code to legacy systems.

  • Java, JSP
  • C, C++, C#
  • COBOL
  • Python
  • Ruby
  • PHP
  • JavaScript, TypeScript
  • HTML, XML, CSS
  • Sass, Less
  • Markdown
  • Visual Basic
  • Plain text

Prevent Code Duplication Errors

A bug is fixed in one part of a codebase. Tests pass, the change is committed, and the issue is considered resolved.

Elsewhere, a similar block of code had been copied earlier to solve a different problem. That duplicate still contains the original bug. Now the issue persists in multiple places.

This is a common result of copy-and-paste development or independently implemented features. Duplicate code makes bugs harder to track, fix, and eliminate completely.

Simian Similarity Analyzer detects duplicated blocks across your codebase, helping you identify and remove redundancy before it leads to inconsistent behavior or hidden defects.

Blazing Performance 🔥

Simian scans millions of lines of code in seconds, identifying duplicated blocks across large codebases.

In one test against the JDK, Simian analyzed over 2.4 million lines of code and reported more than 140,000 duplicate lines in under 5 seconds.

Results may vary depending on factors such as hardware, operating system, processing options, etc.

Simian Runs
Almost Everywhere

Simian runs on the Java Virtual Machine, making it easy to use across Windows, macOS, Linux, and other environments. Run it locally, from scripts, or as part of your build pipeline.

Sample Output

Example of the standard output produced by Simian Similarity Analyzer when used to detect duplicate code in the JDK 9 source code. Results may vary depending on factors such as hardware used, number of duplicate lines, etc. Outputs results in plain text by default, with support for XML, YAML, and editor-friendly formats.

Similarity Analyzer 4.1.2 - https://simian.quandarypeak.com
{failOnDuplication=true, ignoreCharacterCase=true, ignoreCurlyBraces=true, ignoreIdentifierCase=true, ignoreModifiers=true, ignoreStringCase=true, threshold=6}
Found 6 duplicate lines with fingerprint 2340e9b1e2419bcb5516a5a1d9037271 in the following files:
Between lines 70 and 82 in com/sun/corba/se/PortableActivationIDL/_ServerProxyImplBase.java
Between lines 70 and 82 in com/sun/corba/se/spi/activation/_ServerImplBase.java
Between lines 90 and 102 in org/omg/CosNaming/BindingIteratorPOA.java
Found 6 duplicate lines with fingerprint e94fb8a8017a3d05048dcdfb8bce8dff in the following files:
Between lines 101 and 111 in javax/swing/plaf/synth/SynthOptionPaneUI.java
Between lines 96 and 106 in javax/swing/plaf/synth/SynthMenuBarUI.java
Found 6 duplicate lines with fingerprint 16485a9bd0994dc56f52735c2395a7b2 in the following files:
Between lines 290 and 295 in java/time/zone/ZoneRules.java
Between lines 234 and 239 in java/time/zone/ZoneRules.java
Found 6 duplicate lines with fingerprint 7ca74bcd5707431bd195c0d867f5767e in the following files:
Between lines 380 and 398 in org/omg/DynamicAny/_DynFixedStub.java
Between lines 463 and 481 in org/omg/DynamicAny/_DynSequenceStub.java
...
Found 233 duplicate lines with fingerprint 8bc044fa6e21987c76424535dbc1fe47 in the following files:
Between lines 77 and 377 in javax/swing/plaf/nimbus/TextFieldPainter.java
Between lines 77 and 377 in javax/swing/plaf/nimbus/PasswordFieldPainter.java
Found 382 duplicate lines with fingerprint 922ba26b84cbbf0edfabb0e25189c3b4 in the following files:
Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources_sv.java
Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources_es.java
Found 141070 duplicate lines in 12134 blocks in 2406 files
Processed a total of 775314 significant (2402974 raw) lines in 7714 files
Processing time: 4.818sec

Output results in plain text by default, with support for XML, YAML, and editor-friendly formats.

Get Help with Simian

Join the conversation at Stack Overflow and get help with Simian from other developers.

Stack Overflow logo

Use Simian for
the monkey work

Download at GitHub

Open Source Makes It Better

Quandary Peak Research has open-sourced Simian Similarity Analyzer for the software community. We maintain the project under the Apache License 2.0. Feedback and contributions are welcome on GitHub.

Colorful cloud of data